Abstract
As opposed to human drivers, current autonomous driving systems still require vast amounts of labeled data to train. Recently, world models have been proposed to simultaneously enhance autonomous driving capabilities by improving the way these systems understand complex real-world environments and reduce their data demands via self-supervised pre-training. In this paper, we present AD-L-JEPA (aka Autonomous Driving with LiDAR data via a Joint Embedding Predictive Architecture), a novel self-supervised pre-training framework for autonomous driving with LiDAR data that, as opposed to existing methods, is neither generative nor contrastive. Our method learns spatial world models with a joint embedding predictive architecture. Instead of explicitly generating masked unknown regions, our self-supervised world models predict Bird's Eye View (BEV) embeddings to represent the diverse nature of autonomous driving scenes. Our approach furthermore eliminates the need to manually create positive and negative pairs, as is the case in contrastive learning. AD-L-JEPA leads to simpler implementation and enhanced learned representations. We qualitatively and quantitatively demonstrate high-quality of embeddings learned with AD-L-JEPA. We furthermore evaluate the accuracy and label efficiency of AD-L-JEPA on popular downstream tasks such as LiDAR 3D object detection and associated transfer learning. Our experimental evaluation demonstrates that AD-L-JEPA is a plausible approach for self-supervised pre-training in autonomous driving applications and is the best available approach outperforming SOTA, including most recently proposed Occupancy-MAE [1] and ALSO [2]. The source code of AD-L-JEPA is available at this https URL.
Abstract (translated)
与人类驾驶员不同,当前的自动驾驶系统仍然需要大量的标注数据来进行训练。最近,世界模型被提出以同时增强这些系统的理解能力,使其更好地处理复杂的现实环境,并通过自我监督的预训练来减少其对数据的需求。在本文中,我们提出了AD-L-JEPA(即基于LiDAR数据并通过联合嵌入预测架构进行自动驾驶),这是一种新颖的针对自动驾驶中的LiDAR数据的自监督预训练框架,与现有方法不同的是,它既不是生成式的也不是对比式的。我们的方法通过联合嵌入预测架构学习空间世界模型。不同于明确地生成遮蔽未知区域的方式,我们的自监督世界模型会预测俯视图(BEV)嵌入以表示自动驾驶场景的多样性。此外,我们所提出的方法还消除了创建正负样本对的需求,这是对比学习中需要手动完成的任务。因此,AD-L-JEPA简化了实现过程,并提升了学到的表示能力。我们在定性和定量上展示了通过AD-L-JEPA学得的嵌入具有高质量的特点。 为了评估AD-L-JEPA在下游任务中的准确性以及标注效率,我们对包括LiDAR 3D物体检测和相关迁移学习在内的流行任务进行了测试。实验结果表明,AD-L-JEPA是自监督预训练应用于自动驾驶领域的一种可行方法,并且优于现有的最佳方法(SOTA),包括最近提出的Occupancy-MAE [1]和ALSO [2]。 AD-L-JEPA的源代码可以在此网址获取:[此URL]。
URL
https://arxiv.org/abs/2501.04969