Abstract
We propose a novel semi-supervised active learning (SSAL) framework for monocular 3D object detection with LiDAR guidance (MonoLiG), which leverages all modalities of collected data during model development. We utilize LiDAR to guide the data selection and training of monocular 3D detectors without introducing any overhead in the inference phase. During training, we leverage the LiDAR teacher, monocular student cross-modal framework from semi-supervised learning to distill information from unlabeled data as pseudo-labels. To handle the differences in sensor characteristics, we propose a data noise-based weighting mechanism to reduce the effect of propagating noise from LiDAR modality to monocular. For selecting which samples to label to improve the model performance, we propose a sensor consistency-based selection score that is also coherent with the training objective. Extensive experimental results on KITTI and Waymo datasets verify the effectiveness of our proposed framework. In particular, our selection strategy consistently outperforms state-of-the-art active learning baselines, yielding up to 17% better saving rate in labeling costs. Our training strategy attains the top place in KITTI 3D and birds-eye-view (BEV) monocular object detection official benchmarks by improving the BEV Average Precision (AP) by 2.02.
Abstract (translated)
我们提出了一种新的半监督主动学习(SSAL)框架,以单眼3D物体检测为例,利用LiDAR guidance(单眼LiG),在模型开发过程中利用所有数据模式。我们利用LiDAR指导选择和训练单眼3D探测器,在推理阶段没有引入任何额外的负担。在训练过程中,我们利用半监督学习的LiDAR教师和单眼学生跨modal框架,从半监督学习中提取信息,将其作为伪标签进行舍入。为了处理传感器特性的差异,我们提出了一种数据噪声加权机制,以减少LiDAR模式向单眼模式的传播噪声的影响。为了选择哪些样本进行标注以改善模型性能,我们提出了一种传感器一致性选择得分,也与训练目标相一致。在KITTI和Waymo数据集上的广泛实验结果验证了我们提出的框架的有效性。特别是,我们的选择策略 consistently outperforms state-of-the-art主动学习基准线,在标签成本方面提高了17%的节省率。我们的训练策略通过提高KITTI3D和 birds-eye-view(BEV)单眼物体检测官方基准线的AP值,达到了最高排名。
URL
https://arxiv.org/abs/2307.08415