Abstract
LiDAR datasets for autonomous driving exhibit biases in properties such as point cloud density, range, and object dimensions. As a result, object detection networks trained and evaluated in different environments often experience performance degradation. Domain adaptation approaches assume access to unannotated samples from the test distribution to address this problem. However, in the real world, the exact conditions of deployment and access to samples representative of the test dataset may be unavailable while training. We argue that the more realistic and challenging formulation is to require robustness in performance to unseen target domains. We propose to address this problem in a two-pronged manner. First, we leverage paired LiDAR-image data present in most autonomous driving datasets to perform multimodal object detection. We suggest that working with multimodal features by leveraging both images and LiDAR point clouds for scene understanding tasks results in object detectors more robust to unseen domain shifts. Second, we train a 3D object detector to learn multimodal object features across different distributions and promote feature invariance across these source domains to improve generalizability to unseen target domains. To this end, we propose CLIX$^\text{3D}$, a multimodal fusion and supervised contrastive learning framework for 3D object detection that performs alignment of object features from same-class samples of different domains while pushing the features from different classes apart. We show that CLIX$^\text{3D}$ yields state-of-the-art domain generalization performance under multiple dataset shifts.
Abstract (translated)
LiDAR数据集在自动驾驶中存在属性偏见,如点云密度、范围和物体尺寸等。因此,在不同的环境中训练和评估的对象检测网络通常会性能下降。域适应方法假设可以从测试分布访问未标注样本来解决这个问题。然而,在现实生活中,在训练过程中访问测试分布的未标注样本可能是不可能的。我们认为更现实和具有挑战性的方法是要求在未见过的目标领域中具有稳健性。为了应对这个问题,我们提出了双支柱的方法。首先,我们利用大多数自动驾驶数据集中存在的成对LiDAR图像数据来执行多模态目标检测。我们建议通过同时利用图像和LiDAR点云进行场景理解任务,使物体检测器对未见过的领域转移更加稳健。其次,我们训练了一个3D物体检测器,以学习不同分布中的多模态物体特征,并促进这些源域之间的特征不变性,以提高对未见过的目标领域的泛化能力。为此,我们提出了CLIX$^\text{3D}$,一个用于3D物体检测的多模态融合监督学习框架,它在不同分布的同一类样本之间进行对象特征的 alignment,同时将不同类别的特征推向远离。我们证明了,CLIX$^\text{3D}$在多个数据集变化下实现了最先进的领域泛化性能。
URL
https://arxiv.org/abs/2404.11764