Paper Reading AI Learner

ContextualFusion: Context-Based Multi-Sensor Fusion for 3D Object Detection in Adverse Operating Conditions

2024-04-23 06:37:54
Shounak Sural, Nishad Sahu, Ragunathan (Raj)Rajkumar

Abstract

The fusion of multimodal sensor data streams such as camera images and lidar point clouds plays an important role in the operation of autonomous vehicles (AVs). Robust perception across a range of adverse weather and lighting conditions is specifically required for AVs to be deployed widely. While multi-sensor fusion networks have been previously developed for perception in sunny and clear weather conditions, these methods show a significant degradation in performance under night-time and poor weather conditions. In this paper, we propose a simple yet effective technique called ContextualFusion to incorporate the domain knowledge about cameras and lidars behaving differently across lighting and weather variations into 3D object detection models. Specifically, we design a Gated Convolutional Fusion (GatedConv) approach for the fusion of sensor streams based on the operational context. To aid in our evaluation, we use the open-source simulator CARLA to create a multimodal adverse-condition dataset called AdverseOp3D to address the shortcomings of existing datasets being biased towards daytime and good-weather conditions. Our ContextualFusion approach yields an mAP improvement of 6.2% over state-of-the-art methods on our context-balanced synthetic dataset. Finally, our method enhances state-of-the-art 3D objection performance at night on the real-world NuScenes dataset with a significant mAP improvement of 11.7%.

Abstract (translated)

多模态传感器数据流(如摄像头图像和激光雷达点云)在自动驾驶车辆(AV)的操作中发挥着重要作用。在各种恶劣天气和照明条件下实现稳健感知对于广泛部署AV至关重要。虽然之前为在晴朗和良好的天气条件下进行感知而开发了多传感器融合网络,但这些方法在夜间和恶劣天气条件下性能显著下降。在本文中,我们提出了一种简单而有效的技术——ContextualFusion,将相机和激光雷达在不同照明和天气条件下的行为差异领域知识融入到3D物体检测模型中。具体来说,我们设计了一种基于操作上下文的GatedConv融合方法,用于融合传感器流。为了帮助评估,我们使用开源模拟器CARLA创建了一个多模态恶劣条件数据集AdverseOp3D,以解决现有数据集对于白天和良好天气条件的偏差。我们的ContextualFusion方法在 our context-balanced synthetic dataset上的性能提高了6.2%。最后,我们的方法在现实世界的NuScenes数据集上提高了最先进的3D物体性能,显著提高了11.7%的mAP。

URL

https://arxiv.org/abs/2404.14780

PDF

https://arxiv.org/pdf/2404.14780.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot