PCAlign: a general data augmentation framework for point clouds

In this part, we evaluate the performance of PCAlign in point cloud-based semantic classification tasks. We use a cloud computing platform to provide an efficient and fair experimental environment, which is equipped with Intel i9 3.0 GHz and A100 as CPU and GPU units. The operation system is Windows 11 and the deep learning platform is established based on pycharm and pytorch. The test datasets are ModelNet4028 and ShapeNet29 which have been used in many related research works. The point cloud learning networks used for enhancement include: Point1, Point++2, DGCNN7, PCT10. To reveal the performance of PCAlign, we conduct tests on ModelNet40 to investigate its impact on classification tasks under various variations and feature enhancements, including random rotation, resampling, and normal vector augmentation. We also evaluate the improvement of PCAlign in part segmentation task based on ShapeNet models. Finally, we provide a comprehensive analysis to illustrate the operation mechanism of PCAlign.We used the Adam optimizer with momentum and weight decay values set to 0.9 and 0.0001, respectively. We warm up the network for 10 epochs and employ a cosine learning rate schedule for the remaining epochs, decreasing the learning rate to 0.000001 at the final epoch. For the ModelNet dataset, we trained for 150 epochs, starting with an initial learning rate of 0.5. For the ShapeNetPart dataset, we trained for 200 epochs, beginning with an initial learning rate of 0.05. Since we used multiple point cloud copies, the batch size was set to 24.Evaluation for classificationAs mentioned before, the PCAlign can combine any deep neural network to obtain semantic analysis results. To evaluate the performance of PCAlign in the semantic classification task, we use four mainstream networks to provide quantitative results. To achieve fair experimental data, we conducted new parameter training on all network architectures with a fixed epoch number (200). In Table 1, we report the results of the classification accuracy by different methods. Benefited from the PCAlign, most of networks achieve improvements of performance. It demonstrates the excellent generalization of PCAlign for enhancement of feature learning capability. To be a general data augmentation framework, PCAlign needs to demonstrate its superior performance compared to other data augmentation approaches. For the purpose, we compare different data augmentation frameworks with PointNet++ and PCT for classification tasks, including PointAugment23, PointWOLF30, PolarMix17 and Psedoaugment18. In Table 2, we report the quantitative results. It is clear that PCAlign achieves better results.
Table 1 Comparisons mACC, OA and the computational complexity of different deep neural networks with and without PCAlign in classification task based on ModelNet40.Table 2 Comparisons of different data augmentation framework in classification task based on ModelNet40.Fig. 6Visualization of point clouds with random rotations and related aligned copies.Fig. 7Visualization of classification accuracy changes according to the network iterations for different methods. The label (P) means that the method is enhanced by PCAlign.As can be seen from the time complexity in Tables 1 and 2, the parameter count remains consistent with the original algorithm. Because the data augmentation algorithm designed in this paper does not add any extra parameters. The FLOPs have increased fourfold. Because the input samples have increased fourfold, the FLOPs have also increased fourfold. However, there is no significant increase in time. This is because during network inference, the four point cloud copies can be processed in parallel. The only additional time cost is due to the maximum probability selection module. However, the additional time cost for inference is also very small.However, we must acknowledge that the algorithm proposed in this paper incurs a significant increase in time cost during training. Other data augmentation methods have increased the parameter count of the baseline algorithm. The parameter count remains consistent with the original algorithm. In terms of FLOPs, other algorithms only increase a little. In terms of time, other algorithms only add a slight increase as well.Evaluations for random rotationsThe ModelNet40 takes default pose alignment that weakens the advantage of rotation-invariant property implemented by PCAlign. Most of the raw collected point clouds lack unified perspectives, especially for dataset with significant semantic differences across categories. To obtain a more objective measurement for unpredictable poses of point clouds, we add random rotations into the test dataset and implement related feature training for classification task. In Fig. 6, we show some instances of point clouds with random rotations and related aligned copies. PCAlign provides stable pose control. In Table 3, we report the quantitative results of different methods. Comparing with previous experimental data in Table 1, all reference methods show significant performance degradation for the previously reported data when dealing with data that has random rotations. Benefited from the PCA-based alignment, PCAlign provides rotational robustness that ensures training process is not affected by various poses. The classification results based on PCAlign are same between Tables 1 and 3.
Table 3 Comparisons of different deep neural networks with and without PCAlign in classification task based on ModelNet40. Random rotations are added into the point clouds before training which is used to estimate the pose influence.In fact, the precise control of posture provided by PCAlign not only offers rotational robustness, but also aids in the convergence of final deep network parameter optimization. The reason is that the feature encoding on aligned copies helps capture significant geometric information related to the semantic information, while simultaneously eliminating semantic ambiguity caused by different poses. To demonstrate the hypothesis, we present the classification accuracy as a function of the number of network iterations in Fig. 7. It proves that PCAlign is helpful for convergence.Evaluations for random resamplingIt has been discussed that the local neighborhoods with different point distributions or densities take an important influence on data augmentation. The reason is that the neighbor structures decide the feature coding paths in most of deep neural networks. In general, the coding paths are constructed by K-nearest neighbor searching and farthest point sampling. Significantly, non-uniform distributions change the k neighbors for points. In the previous test, the point clouds were pre-processed by uniform simplification to optimize the point distributions. To evaluate the influence of non-uniform distributions in point-based semantic analysis, we provide a quantitative analysis for point clouds with different point distributions. We use a random resampling to select points from the point cloud, which doesn’t consider densities. In Fig. 8, we compare two kinds of resampling results. The random resampling changes the point distributions in different local regions. Based on the changed point clouds, we retrain networks and report new classification results in Table 4. Experimental data directly reflects the sensitivity of the related neural network to point distributions in local neighborhoods. Overall, PCAlign can achieve more stable results.Fig. 8Visualization of random resampling for point clouds.Table 4 Comparisons of different deep neural networks with and without improvement of PCAlign in classification task based on ModelNet40. Random resampling is added into the point clouds before training which is used to evaluate the influence of point distributions.Evaluations for normal vector enhancementIn the latest experiments and engineering practices, researchers found that introducing normal vectors as input data can effectively enhance the feature learning ability of deep neural networks. By only learning the coordinates of points, the network naturally becomes sensitive to poses. Once normal vectors are introduced, more local geometric information is incorporated into the feature encoding process, which significantly enhances the encoding capability of related deep neural networks for point cloud geometric features. To evaluate the influence of normal vector enhancement, we report the classification accuracy of different methods in Table 5. It can be observed that the performance of the majority of methods has been improved. Due to the PCA-based alignment, PCAlign effectively establishes a global normal alignment for point clouds. It limits the performance improvement of normal vector enhancement for PCAlign.
Table 5 Comparisons of different deep neural networks with and without improvement of PCAlign in classification task based on ModelNet40. Normal vectors are bound as the input regarded as the default data augmentation.Fig. 9Visualization of part segmentation of different data segmentation frameworks.Evaluations for part segmentationFor improvement of classification, aligning poses of PCAlign is a straightforward and intuitive approach. Indeed, it can particularly enhance object recognition accuracy in cases where there are significant pose variations between point clouds. To further validate PCAlign’s generalization in deep learning tasks, we evaluate its performance in the task of part segmentation. In Table 6, we report the improvement of PCAlign for PCT10 and PG31. The classification accuracy of most categories can be improved. We also compare different data augmentation frameworks with PCT for part segmentation task. In Fig. 9, some instances are shown. In Table 7, we report the quantitative results. The PointAugment23 and PointWOLF30 attempt to change the point positions, which take some diversity of semantic features. However, such diversity cannot improve the accuracy of local feature detection, especially for the joints of components. PCAlign achieves better results as a general data augmentation framework.
Table 6 Comparisons of different deep neural networks with and without improvement of PCAlign in part segmentation task based on ShapeNet.Table 7 Comparisons of different data augmentation framework in part segmentation task based on ShapeNet. The backbone is PCT10.Evaluation of ablation experiments
Table 8 Investigation of different number of copies.Table 9 Ablation study about the multi-channel structure.To demonstrate the effectiveness of copy settings, we conducted ablation studies by varying the number of copies sampled from generated samples. As shown in Table 8, the performance is worst with 1 copy. At 2 copies, performance improves the most as the voting mechanism takes effect. With 3 copies, performance further improves, reaching its peak at 4 copies. This is because increased votes help correct errors with accurate predictions from other points in the point cloud. However, with 5 and 6 copies, excessive copies introduce more errors, leading to biased predictions and significantly longer training times. Summing over four copies and taking the maximum value is crude and further degrades performance due to incorrect predictions. However, averaging performs better than a single copy, demonstrating this module’s effectiveness in our study.We conducted some experiments, including averaging, summing, and performing convolution operations, with the results presented in Table 9. Both averaging and summing yield similar results, significantly lower than those achieved using our proposed module. These methods are too simplistic to filter out the correct point cloud copies, resembling traditional data augmentation by rotating data, but involve averaging or summing point cloud feature copies. Convolution slightly improves performance over averaging and summing due to the increased use of network parameters, though this improvement is minor. In contrast, our module achieves peak performance by effectively identifying the correct copies among the 4 point cloud copies.AnalysisBased on the experimental data, the performance improvement brought by PCAlign is significant. By comparing the data in Tables 1 and 3, it can be observed that the random model posture significantly reduces the accuracy of feature learning in traditional deep networks. PCAlign can avoid the loss of feature learning accuracy, especially when the training data has significant pose variations. For point clouds with non-uniform densities, some methods may experience degraded performance. PCAlign can reverse this degradation and make feature learning of non-uniform point clouds more stable. In Table 10, we show the performance fluctuations for different methods based on the experimental data of Tables 1, 2, 3, 4 and 5. It proves that the PCAlign is able to stabilize the performance of various methods and improve their robustness. For part segmentation task, PCAlign provides improvement for most categories based on quantitative analysis in Tables 6 and 7. Compared with other data augmentation frameworks, PCAlign can achieve better performance.
Table 10 Performance fluctuations of classification accuracy by different methods. The average values of methods enhanced by PCAlign are larger than original ones. The range of performance fluctuation (±) is controlled by PCAlign significantly, which means that the robustness is improved.Table 11 Comparisons of PointMLP with and without PCAlign in different test datasets with mentioned variations on ModelNet40.LimitationsPCAlign primarily implements data augmentation through pose alignment, it can achieve significant improvement for training point clouds with random poses, which has been proved in Table 3. However, it cannot improve the classification performance by adding input features such as normal vectors. The experimental data shown in Table 5 illustrate that normal vectors cannot improve the classification accuracy for methods with PCAlign-based data augmentation. The reason is that the maximum probability selection mandatory select a single pose as the output result which is not good to point clouds that belongs to the same category but exhibits different distributions. The supplementary role of the normal vector is diminished precisely due to the mandatory pose selection. Another limitation is that PCAlign removes the random rotation module before training which may lead to performance degradation. Some networks can achieve stronger feature learning capabilities with improved robustness through the analysis of local neighborhoods accompanied by random rotation module. In Table 11, we show the classification accuracy of PointMLP32 with and without PCAlign. Due to the removal of the random rotation module, there has been a certain degradation in performance even the poses of point clouds are aligned.

Hot Topics

Related Articles