Abstract
Radars and cameras belong to the most frequently used sensors for advanced driver assistance systems and automated driving research. However, there has been surprisingly little research on radar-camera fusion with neural networks. One of the reasons is a lack of large-scale automotive datasets with radar and unmasked camera data, with the exception of the nuScenes dataset. Another reason is the difficulty of effectively fusing the sparse radar point cloud on the bird's eye view (BEV) plane with the dense images on the perspective plane. The recent trend of camera-based 3D object detection using BEV features has enabled a new type of fusion, which is better suited for radars. In this work, we present RC-BEVFusion, a modular radar-camera fusion network on the BEV plane. We propose BEVFeatureNet, a novel radar encoder branch, and show that it can be incorporated into several state-of-the-art camera-based architectures. We show significant performance gains of up to 28% increase in the nuScenes detection score, which is an important step in radar-camera fusion research. Without tuning our model for the nuScenes benchmark, we achieve the best result among all published methods in the radar-camera fusion category.
Abstract (translated)
雷达和摄像头是高级驾驶辅助系统和自动驾驶研究的最常用的传感器之一。然而,与神经网络的雷达-摄像头融合研究却相对较少,这让人感到意外。其中一个原因是缺乏大规模包含雷达和未暴露摄像头数据的汽车数据集,除了nuScenes数据集之外。另一个原因是有效地融合在鸟眼视图(BEV)平面上的稀疏雷达点云和Perspective平面上的密集图像是困难的。最近的趋势是使用基于BEV特征的相机三维物体检测,这导致了一种新的融合类型,更适合雷达。在本文中,我们介绍了RC-BEVFusion,这是一个基于BEV平面的模块化雷达-摄像头融合网络。我们提出了BEVFeatureNet,这是一种新的雷达编码分支,并证明它可以被集成到多个先进的相机架构中。我们展示了nuScenes检测得分的显著提高,达到28%的增加,这是雷达-摄像头融合研究的一个重要步骤。在没有对nuScenes基准进行调优的情况下,我们取得了雷达-摄像头融合类别中最好的结果。
URL
https://arxiv.org/abs/2305.15883