Paper Reading AI Learner

Monocular 3D Object Detection with LiDAR Guided Semi Supervised Active Learning

2023-07-17 11:55:27
Aral Hekimoglu, Michael Schmidt, Alvaro Marcos-Ramiro

Abstract

We propose a novel semi-supervised active learning (SSAL) framework for monocular 3D object detection with LiDAR guidance (MonoLiG), which leverages all modalities of collected data during model development. We utilize LiDAR to guide the data selection and training of monocular 3D detectors without introducing any overhead in the inference phase. During training, we leverage the LiDAR teacher, monocular student cross-modal framework from semi-supervised learning to distill information from unlabeled data as pseudo-labels. To handle the differences in sensor characteristics, we propose a data noise-based weighting mechanism to reduce the effect of propagating noise from LiDAR modality to monocular. For selecting which samples to label to improve the model performance, we propose a sensor consistency-based selection score that is also coherent with the training objective. Extensive experimental results on KITTI and Waymo datasets verify the effectiveness of our proposed framework. In particular, our selection strategy consistently outperforms state-of-the-art active learning baselines, yielding up to 17% better saving rate in labeling costs. Our training strategy attains the top place in KITTI 3D and birds-eye-view (BEV) monocular object detection official benchmarks by improving the BEV Average Precision (AP) by 2.02.

Abstract (translated)

我们提出了一种新的半监督主动学习(SSAL)框架,以单眼3D物体检测为例,利用LiDAR guidance(单眼LiG),在模型开发过程中利用所有数据模式。我们利用LiDAR指导选择和训练单眼3D探测器,在推理阶段没有引入任何额外的负担。在训练过程中,我们利用半监督学习的LiDAR教师和单眼学生跨modal框架,从半监督学习中提取信息,将其作为伪标签进行舍入。为了处理传感器特性的差异,我们提出了一种数据噪声加权机制,以减少LiDAR模式向单眼模式的传播噪声的影响。为了选择哪些样本进行标注以改善模型性能,我们提出了一种传感器一致性选择得分,也与训练目标相一致。在KITTI和Waymo数据集上的广泛实验结果验证了我们提出的框架的有效性。特别是,我们的选择策略 consistently outperforms state-of-the-art主动学习基准线,在标签成本方面提高了17%的节省率。我们的训练策略通过提高KITTI3D和 birds-eye-view(BEV)单眼物体检测官方基准线的AP值,达到了最高排名。

URL

https://arxiv.org/abs/2307.08415

PDF

https://arxiv.org/pdf/2307.08415.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot