Paper Reading AI Learner

Spherical Transformer for LiDAR-based 3D Recognition

2023-03-22 17:30:14
Xin Lai, Yukang Chen, Fanbin Lu, Jianhui Liu, Jiaya Jia

Abstract

LiDAR-based 3D point cloud recognition has benefited various applications. Without specially considering the LiDAR point distribution, most current methods suffer from information disconnection and limited receptive field, especially for the sparse distant points. In this work, we study the varying-sparsity distribution of LiDAR points and present SphereFormer to directly aggregate information from dense close points to the sparse distant ones. We design radial window self-attention that partitions the space into multiple non-overlapping narrow and long windows. It overcomes the disconnection issue and enlarges the receptive field smoothly and dramatically, which significantly boosts the performance of sparse distant points. Moreover, to fit the narrow and long windows, we propose exponential splitting to yield fine-grained position encoding and dynamic feature selection to increase model representation ability. Notably, our method ranks 1st on both nuScenes and SemanticKITTI semantic segmentation benchmarks with 81.9% and 74.8% mIoU, respectively. Also, we achieve the 3rd place on nuScenes object detection benchmark with 72.8% NDS and 68.5% mAP. Code is available at this https URL.

Abstract (translated)

利用激光雷达点云识别3D点云的方法可以造福多种应用程序。如果没有特别考虑激光雷达点云分布,大多数当前方法都面临信息断开和接收域有限的问题,特别是对于稀疏遥远的点。在这项工作中,我们研究了激光雷达点云的 varying-sparss分布,并提出了Sphere Former直接聚合从密集接近点到稀疏遥远的信息。我们设计了径向窗口自注意力,将空间划分为多个非重叠的窄长窗口。它克服了信息断开的问题,并且极大地扩展了接收域,这极大地提高了稀疏遥远的点的性能。此外,为了适应窄长窗口,我们提出了指数分割,生成精细的位置编码和动态特征选择,以提高模型表示能力。值得注意的是,我们的方法在nuScenes和SemanticKITTI语义分割基准测试中分别获得了81.9%和74.8%的mIoU,同时,在nuScenes物体检测基准测试中获得了第3名,72.8%的NDS和68.5%的mAP。代码在此httpsURL上可用。

URL

https://arxiv.org/abs/2303.12766

PDF

https://arxiv.org/pdf/2303.12766.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot