Paper Reading AI Learner

Less is More: Reducing Task and Model Complexity for 3D Point Cloud Semantic Segmentation

2023-03-20 15:36:10
Li Li, Hubert P. H. Shum, Toby P. Breckon

Abstract

Whilst the availability of 3D LiDAR point cloud data has significantly grown in recent years, annotation remains expensive and time-consuming, leading to a demand for semi-supervised semantic segmentation methods with application domains such as autonomous driving. Existing work very often employs relatively large segmentation backbone networks to improve segmentation accuracy, at the expense of computational costs. In addition, many use uniform sampling to reduce ground truth data requirements for learning needed, often resulting in sub-optimal performance. To address these issues, we propose a new pipeline that employs a smaller architecture, requiring fewer ground-truth annotations to achieve superior segmentation accuracy compared to contemporary approaches. This is facilitated via a novel Sparse Depthwise Separable Convolution module that significantly reduces the network parameter count while retaining overall task performance. To effectively sub-sample our training data, we propose a new Spatio-Temporal Redundant Frame Downsampling (ST-RFD) method that leverages knowledge of sensor motion within the environment to extract a more diverse subset of training data frame samples. To leverage the use of limited annotated data samples, we further propose a soft pseudo-label method informed by LiDAR reflectivity. Our method outperforms contemporary semi-supervised work in terms of mIoU, using less labeled data, on the SemanticKITTI (59.5@5%) and ScribbleKITTI (58.1@5%) benchmark datasets, based on a 2.3x reduction in model parameters and 641x fewer multiply-add operations whilst also demonstrating significant performance improvement on limited training data (i.e., Less is More).

Abstract (translated)

过去几年中,3D激光雷达点云数据的可用性急剧增长,但标注仍然昂贵且费时,导致对于像自动驾驶这样的应用 domains 的半监督语义分割方法的需求增加。现有的工作往往使用相对较大的逻辑分割基础网络来提高分割精度,以牺牲计算成本。此外,许多人使用均匀采样来减少需要 ground-truth 数据的学习需求,往往导致性能不佳。为了解决这些问题,我们提出了一种新的 pipeline,使用较小的架构,需要更少的 ground-truth 标注来实现比 contemporary 方法更高的分割精度。这可以通过一种新的稀疏深度分离卷积模块来实现,该模块显著减少了网络参数计数,同时保留了整体任务表现。为了有效地缩小我们的训练数据集,我们提出了一种新的时间序列冗余帧采样方法(ST-RFD),利用环境传感器运动知识提取训练数据帧样本的更多元化的子集。为了利用有限的标注数据样本,我们进一步提出了一种基于 LiDAR 反射信息的软伪标签方法。我们的方法在更少标记数据的情况下比 contemporary 半监督工作在语义KITTI(59.5@5%)和 ScribbleKITTI(58.1@5%)基准数据集上的性能表现更好,基于模型参数的2.3x减少和641x fewer 乘加操作,同时也展示了在有限训练数据上的显著性能改进(即更少是更多)。

URL

https://arxiv.org/abs/2303.11203

PDF

https://arxiv.org/pdf/2303.11203.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot