Abstract
The fusion of multimodal sensor data streams such as camera images and lidar point clouds plays an important role in the operation of autonomous vehicles (AVs). Robust perception across a range of adverse weather and lighting conditions is specifically required for AVs to be deployed widely. While multi-sensor fusion networks have been previously developed for perception in sunny and clear weather conditions, these methods show a significant degradation in performance under night-time and poor weather conditions. In this paper, we propose a simple yet effective technique called ContextualFusion to incorporate the domain knowledge about cameras and lidars behaving differently across lighting and weather variations into 3D object detection models. Specifically, we design a Gated Convolutional Fusion (GatedConv) approach for the fusion of sensor streams based on the operational context. To aid in our evaluation, we use the open-source simulator CARLA to create a multimodal adverse-condition dataset called AdverseOp3D to address the shortcomings of existing datasets being biased towards daytime and good-weather conditions. Our ContextualFusion approach yields an mAP improvement of 6.2% over state-of-the-art methods on our context-balanced synthetic dataset. Finally, our method enhances state-of-the-art 3D objection performance at night on the real-world NuScenes dataset with a significant mAP improvement of 11.7%.
Abstract (translated)
多模态传感器数据流(如摄像头图像和激光雷达点云)在自动驾驶车辆(AV)的操作中发挥着重要作用。在各种恶劣天气和照明条件下实现稳健感知对于广泛部署AV至关重要。虽然之前为在晴朗和良好的天气条件下进行感知而开发了多传感器融合网络,但这些方法在夜间和恶劣天气条件下性能显著下降。在本文中,我们提出了一种简单而有效的技术——ContextualFusion,将相机和激光雷达在不同照明和天气条件下的行为差异领域知识融入到3D物体检测模型中。具体来说,我们设计了一种基于操作上下文的GatedConv融合方法,用于融合传感器流。为了帮助评估,我们使用开源模拟器CARLA创建了一个多模态恶劣条件数据集AdverseOp3D,以解决现有数据集对于白天和良好天气条件的偏差。我们的ContextualFusion方法在 our context-balanced synthetic dataset上的性能提高了6.2%。最后,我们的方法在现实世界的NuScenes数据集上提高了最先进的3D物体性能,显著提高了11.7%的mAP。
URL
https://arxiv.org/abs/2404.14780