Paper Reading AI Learner

DynStatF: An Efficient Feature Fusion Strategy for LiDAR 3D Object Detection

2023-05-24 15:00:01
Yao Rong, Xiangyu Wei, Tianwei Lin, Yueyu Wang, Enkelejda Kasneci

Abstract

Augmenting LiDAR input with multiple previous frames provides richer semantic information and thus boosts performance in 3D object detection, However, crowded point clouds in multi-frames can hurt the precise position information due to the motion blur and inaccurate point projection. In this work, we propose a novel feature fusion strategy, DynStaF (Dynamic-Static Fusion), which enhances the rich semantic information provided by the multi-frame (dynamic branch) with the accurate location information from the current single-frame (static branch). To effectively extract and aggregate complimentary features, DynStaF contains two modules, Neighborhood Cross Attention (NCA) and Dynamic-Static Interaction (DSI), operating through a dual pathway architecture. NCA takes the features in the static branch as queries and the features in the dynamic branch as keys (values). When computing the attention, we address the sparsity of point clouds and take only neighborhood positions into consideration. NCA fuses two features at different feature map scales, followed by DSI providing the comprehensive interaction. To analyze our proposed strategy DynStaF, we conduct extensive experiments on the nuScenes dataset. On the test set, DynStaF increases the performance of PointPillars in NDS by a large margin from 57.7% to 61.6%. When combined with CenterPoint, our framework achieves 61.0% mAP and 67.7% NDS, leading to state-of-the-art performance without bells and whistles.

Abstract (translated)

将多个先前帧的 LiDAR 输入增强可以提供更丰富的语义信息,从而提高 3D 物体检测的性能,然而,多帧中密集的焦点 clouds 可能会伤害准确的定位信息,由于运动模糊和不准确点投影。在本文中,我们提出了一种新的特征融合策略:DynStaF(动态静态融合),该策略通过增强当前单帧(静态分支)中的准确位置信息与多帧(动态分支)中的最新帧(动态分支)提供丰富的语义信息。为了有效地提取和聚合互补特征,DynStaF 包含两个模块:邻居交叉注意力(NCA)和动态静态交互(DSI),通过双通道架构运行。NCA 将静态分支中的特征是查询,动态分支中的特征是键(值)。在计算注意力时,我们解决了点 clouds 的稀疏性,仅考虑邻居位置。NCA 在不同特征映射尺度上融合两个特征,然后提供全面的交互。为了分析我们提出的策略 DynStaF,我们开展了广泛的实验,在nuScenes数据集上进行了测试。在测试集上,DynStaF 从57.7%提高到了61.6%,与中心点组合后,我们的框架达到61.0% mAP和67.7% NDS,达到了最先进的性能,而无需各种花哨功能。

URL

https://arxiv.org/abs/2305.15219

PDF

https://arxiv.org/pdf/2305.15219.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot