Paper Reading AI Learner

Hierarchical Attention Learning of Scene Flow in 3D Point Clouds

2020-10-12 14:56:08
Guangming Wang, Xinrui Wu, Zhe Liu, Hesheng Wang

Abstract

Scene flow represents the 3D motion of every point in the dynamic environments. Like the optical flow that represents the motion of pixels in 2D images, 3D motion representation of scene flow benefits many applications, such as autonomous driving and service robot. This paper studies the problem of scene flow estimation from two consecutive 3D point clouds. In this paper, a novel hierarchical neural network with double attention is proposed for learning the correlation of point features in adjacent frames and refining scene flow from coarse to fine layer by layer. The proposed network has a new more-for-less hierarchical architecture. The more-for-less means that the number of input points is greater than the number of output points for scene flow estimation, which brings more input information and balances the precision and resource consumption. In this hierarchical architecture, scene flow of different levels is generated and supervised respectively. A novel attentive embedding module is introduced to aggregate the features of adjacent points using a double attention method in a patch-to-patch manner. The proper layers for flow embedding and flow supervision are carefully considered in our network designment. Experiments show that the proposed network outperforms the state-of-the-art performance of 3D scene flow estimation on the FlyingThings3D and KITTI Scene Flow 2015 datasets. We also apply the proposed network to realistic LiDAR odometry task, which is an key problem in autonomous driving. The experiment results demonstrate that our proposed network can outperform the ICP-based method and shows the good practical application ability.

Abstract (translated)

URL

https://arxiv.org/abs/2010.05762

PDF

https://arxiv.org/pdf/2010.05762.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot