Abstract
Current state-of-the-art (SOTA) 3D object detection methods often require a large amount of 3D bounding box annotations for training. However, collecting such large-scale densely-supervised datasets is notoriously costly. To reduce the cumbersome data annotation process, we propose a novel sparsely-annotated framework, in which we just annotate one 3D object per scene. Such a sparse annotation strategy could significantly reduce the heavy annotation burden, while inexact and incomplete sparse supervision may severely deteriorate the detection performance. To address this issue, we develop the SS3D++ method that alternatively improves 3D detector training and confident fully-annotated scene generation in a unified learning scheme. Using sparse annotations as seeds, we progressively generate confident fully-annotated scenes based on designing a missing-annotated instance mining module and reliable background mining module. Our proposed method produces competitive results when compared with SOTA weakly-supervised methods using the same or even more annotation costs. Besides, compared with SOTA fully-supervised methods, we achieve on-par or even better performance on the KITTI dataset with about 5x less annotation cost, and 90% of their performance on the Waymo dataset with about 15x less annotation cost. The additional unlabeled training scenes could further boost the performance. The code will be available at this https URL.
Abstract (translated)
目前最先进的3D物体检测方法通常需要大量的3D边界框注释来进行训练。然而,收集这样大规模的密集监督数据集是非常昂贵的。为了减少繁琐的数据注释过程,我们提出了一个新颖的稀疏注释框架,其中我们仅在每个场景中注释一个3D物体。这种稀疏注释策略可以显著减少繁重的注释负担,然而,不准确和不完整的稀疏监督可能会严重削弱检测性能。为了解决这个问题,我们开发了SS3D++方法,在统一的训练方案中同时改进3D检测训练和自信完全注释场景生成。通过稀疏注释作为种子,我们根据设计缺失注释实例挖掘模块和可靠背景挖掘模块,逐步生成自信完全注释场景。与SOTA弱监督方法相比,我们的方法在相同或甚至更高的注释成本下产生了竞争力的结果。此外,与SOTA完全监督方法相比,我们在KITTI数据集上实现了与或甚至更好的性能,在不到5倍的成本下,而在Waymo数据集上实现了与或更好的性能,在不到15倍的成本下。此外,稀疏训练场景可以进一步提高性能。代码将在此处公布:https://www.xxx.com。
URL
https://arxiv.org/abs/2403.02818