Paper Reading AI Learner

Multi-View Adaptive Fusion Network for 3D Object Detection

2020-11-02 00:06:01
Guojun Wang, Bin Tian, Yachen Zhang, Long Chen, Dongpu Cao, Jian Wu

Abstract

3D object detection based on LiDAR-camera fusion is becoming an emerging research theme for autonomous driving. However, it has been surprisingly difficult to effectively fuse both modalities without information loss and interference. To solve this issue, we propose a single-stage multi-view fusion framework that takes LiDAR Birds-Eye View, LiDAR Range View and Camera View images as inputs for 3D object detection. To effectively fuse multi-view features, we propose an Attentive Pointwise Fusion (APF) module to estimate the importance of the three sources with attention mechanisms which can achieve adaptive fusion of multi-view features in a pointwise manner. Besides, an Attentive Pointwise Weighting (APW) module is designed to help the network learn structure information and point feature importance with two extra tasks: foreground classification and center regression, and the predicted foreground probability will be used to reweight the point features. We design an end-to-end learnable network named MVAF-Net to integrate these two components. Our evaluations conducted on the KITTI 3D object detection datasets demonstrate that the proposed APF and APW module offer significant performance gain and that the proposed MVAF-Net achieves state-of-the-art performance in the KITTI benchmark.

Abstract (translated)

URL

https://arxiv.org/abs/2011.00652

PDF

https://arxiv.org/pdf/2011.00652.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot