Abstract
The significant achievements of pre-trained models leveraging large volumes of data in the field of NLP and 2D vision inspire us to explore the potential of extensive data pre-training for 3D perception in autonomous driving. Toward this goal, this paper proposes to utilize massive unlabeled data from heterogeneous datasets to pre-train 3D perception models. We introduce a self-supervised pre-training framework that learns effective 3D representations from scratch on unlabeled data, combined with a prompt adapter based domain adaptation strategy to reduce dataset bias. The approach significantly improves model performance on downstream tasks such as 3D object detection, BEV segmentation, 3D object tracking, and occupancy prediction, and shows steady performance increase as the training data volume scales up, demonstrating the potential of continually benefit 3D perception models for autonomous driving. We will release the source code to inspire further investigations in the community.
Abstract (translated)
在自然语言处理(NLP)和2D视觉领域,预训练模型利用大量数据所取得的显著成就激励我们探索广泛数据预训练在自动驾驶3D感知中的潜力。为此,本文提出了一种利用来自异构数据集的大规模未标注数据对3D感知模型进行预训练的方法。我们介绍了一个自我监督的预训练框架,该框架可以从无标签的数据中从头开始学习有效的3D表示,并结合基于提示适配器的领域自适应策略来减少数据集偏差。这种方法在诸如3D目标检测、鸟瞰图(BEV)分割、3D目标跟踪和占用预测等下游任务上显著提升了模型性能,同时随着训练数据量的增加而表现出稳定提升的表现,展示了持续受益于3D感知模型以支持自动驾驶的潜力。我们将发布源代码以激发社区进一步的研究探索。
URL
https://arxiv.org/abs/2504.12709