Abstract
Augmenting LiDAR input with multiple previous frames provides richer semantic information and thus boosts performance in 3D object detection, However, crowded point clouds in multi-frames can hurt the precise position information due to the motion blur and inaccurate point projection. In this work, we propose a novel feature fusion strategy, DynStaF (Dynamic-Static Fusion), which enhances the rich semantic information provided by the multi-frame (dynamic branch) with the accurate location information from the current single-frame (static branch). To effectively extract and aggregate complimentary features, DynStaF contains two modules, Neighborhood Cross Attention (NCA) and Dynamic-Static Interaction (DSI), operating through a dual pathway architecture. NCA takes the features in the static branch as queries and the features in the dynamic branch as keys (values). When computing the attention, we address the sparsity of point clouds and take only neighborhood positions into consideration. NCA fuses two features at different feature map scales, followed by DSI providing the comprehensive interaction. To analyze our proposed strategy DynStaF, we conduct extensive experiments on the nuScenes dataset. On the test set, DynStaF increases the performance of PointPillars in NDS by a large margin from 57.7% to 61.6%. When combined with CenterPoint, our framework achieves 61.0% mAP and 67.7% NDS, leading to state-of-the-art performance without bells and whistles.
Abstract (translated)
将多个先前帧的 LiDAR 输入增强可以提供更丰富的语义信息,从而提高 3D 物体检测的性能,然而,多帧中密集的焦点 clouds 可能会伤害准确的定位信息,由于运动模糊和不准确点投影。在本文中,我们提出了一种新的特征融合策略:DynStaF(动态静态融合),该策略通过增强当前单帧(静态分支)中的准确位置信息与多帧(动态分支)中的最新帧(动态分支)提供丰富的语义信息。为了有效地提取和聚合互补特征,DynStaF 包含两个模块:邻居交叉注意力(NCA)和动态静态交互(DSI),通过双通道架构运行。NCA 将静态分支中的特征是查询,动态分支中的特征是键(值)。在计算注意力时,我们解决了点 clouds 的稀疏性,仅考虑邻居位置。NCA 在不同特征映射尺度上融合两个特征,然后提供全面的交互。为了分析我们提出的策略 DynStaF,我们开展了广泛的实验,在nuScenes数据集上进行了测试。在测试集上,DynStaF 从57.7%提高到了61.6%,与中心点组合后,我们的框架达到61.0% mAP和67.7% NDS,达到了最先进的性能,而无需各种花哨功能。
URL
https://arxiv.org/abs/2305.15219