Paper Reading AI Learner

RC-BEVFusion: A Plug-In Module for Radar-Camera Bird's Eye View Feature Fusion

2023-05-25 09:26:04
Lukas Stäcker, Shashank Mishra, Philipp Heidenreich, Jason Rambach, Didier Stricker

Abstract

Radars and cameras belong to the most frequently used sensors for advanced driver assistance systems and automated driving research. However, there has been surprisingly little research on radar-camera fusion with neural networks. One of the reasons is a lack of large-scale automotive datasets with radar and unmasked camera data, with the exception of the nuScenes dataset. Another reason is the difficulty of effectively fusing the sparse radar point cloud on the bird's eye view (BEV) plane with the dense images on the perspective plane. The recent trend of camera-based 3D object detection using BEV features has enabled a new type of fusion, which is better suited for radars. In this work, we present RC-BEVFusion, a modular radar-camera fusion network on the BEV plane. We propose BEVFeatureNet, a novel radar encoder branch, and show that it can be incorporated into several state-of-the-art camera-based architectures. We show significant performance gains of up to 28% increase in the nuScenes detection score, which is an important step in radar-camera fusion research. Without tuning our model for the nuScenes benchmark, we achieve the best result among all published methods in the radar-camera fusion category.

Abstract (translated)

雷达和摄像头是高级驾驶辅助系统和自动驾驶研究的最常用的传感器之一。然而,与神经网络的雷达-摄像头融合研究却相对较少,这让人感到意外。其中一个原因是缺乏大规模包含雷达和未暴露摄像头数据的汽车数据集,除了nuScenes数据集之外。另一个原因是有效地融合在鸟眼视图(BEV)平面上的稀疏雷达点云和Perspective平面上的密集图像是困难的。最近的趋势是使用基于BEV特征的相机三维物体检测,这导致了一种新的融合类型,更适合雷达。在本文中,我们介绍了RC-BEVFusion,这是一个基于BEV平面的模块化雷达-摄像头融合网络。我们提出了BEVFeatureNet,这是一种新的雷达编码分支,并证明它可以被集成到多个先进的相机架构中。我们展示了nuScenes检测得分的显著提高,达到28%的增加,这是雷达-摄像头融合研究的一个重要步骤。在没有对nuScenes基准进行调优的情况下,我们取得了雷达-摄像头融合类别中最好的结果。

URL

https://arxiv.org/abs/2305.15883

PDF

https://arxiv.org/pdf/2305.15883.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot