Paper Reading AI Learner

Are Dense Labels Always Necessary for 3D Object Detection from Point Cloud?

2024-03-05 09:38:11
Chenqiang Gao, Chuandong Liu, Jun Shu, Fangcen Liu, Jiang Liu, Luyu Yang, Xinbo Gao, Deyu Meng

Abstract

Current state-of-the-art (SOTA) 3D object detection methods often require a large amount of 3D bounding box annotations for training. However, collecting such large-scale densely-supervised datasets is notoriously costly. To reduce the cumbersome data annotation process, we propose a novel sparsely-annotated framework, in which we just annotate one 3D object per scene. Such a sparse annotation strategy could significantly reduce the heavy annotation burden, while inexact and incomplete sparse supervision may severely deteriorate the detection performance. To address this issue, we develop the SS3D++ method that alternatively improves 3D detector training and confident fully-annotated scene generation in a unified learning scheme. Using sparse annotations as seeds, we progressively generate confident fully-annotated scenes based on designing a missing-annotated instance mining module and reliable background mining module. Our proposed method produces competitive results when compared with SOTA weakly-supervised methods using the same or even more annotation costs. Besides, compared with SOTA fully-supervised methods, we achieve on-par or even better performance on the KITTI dataset with about 5x less annotation cost, and 90% of their performance on the Waymo dataset with about 15x less annotation cost. The additional unlabeled training scenes could further boost the performance. The code will be available at this https URL.

Abstract (translated)

目前最先进的3D物体检测方法通常需要大量的3D边界框注释来进行训练。然而,收集这样大规模的密集监督数据集是非常昂贵的。为了减少繁琐的数据注释过程,我们提出了一个新颖的稀疏注释框架,其中我们仅在每个场景中注释一个3D物体。这种稀疏注释策略可以显著减少繁重的注释负担,然而,不准确和不完整的稀疏监督可能会严重削弱检测性能。为了解决这个问题,我们开发了SS3D++方法,在统一的训练方案中同时改进3D检测训练和自信完全注释场景生成。通过稀疏注释作为种子,我们根据设计缺失注释实例挖掘模块和可靠背景挖掘模块,逐步生成自信完全注释场景。与SOTA弱监督方法相比,我们的方法在相同或甚至更高的注释成本下产生了竞争力的结果。此外,与SOTA完全监督方法相比,我们在KITTI数据集上实现了与或甚至更好的性能,在不到5倍的成本下,而在Waymo数据集上实现了与或更好的性能,在不到15倍的成本下。此外,稀疏训练场景可以进一步提高性能。代码将在此处公布:https://www.xxx.com。

URL

https://arxiv.org/abs/2403.02818

PDF

https://arxiv.org/pdf/2403.02818.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot