Paper Reading AI Learner

Spherical Frustum Sparse Convolution Network for LiDAR Point Cloud Semantic Segmentation

2023-11-29 09:55:13
Yu Zheng, Guangming Wang, Jiuming Liu, Marc Pollefeys, Hesheng Wang

Abstract

LiDAR point cloud semantic segmentation enables the robots to obtain fine-grained semantic information of the surrounding environment. Recently, many works project the point cloud onto the 2D image and adopt the 2D Convolutional Neural Networks (CNNs) or vision transformer for LiDAR point cloud semantic segmentation. However, since more than one point can be projected onto the same 2D position but only one point can be preserved, the previous 2D image-based segmentation methods suffer from inevitable quantized information loss. To avoid quantized information loss, in this paper, we propose a novel spherical frustum structure. The points projected onto the same 2D position are preserved in the spherical frustums. Moreover, we propose a memory-efficient hash-based representation of spherical frustums. Through the hash-based representation, we propose the Spherical Frustum sparse Convolution (SFC) and Frustum Fast Point Sampling (F2PS) to convolve and sample the points stored in spherical frustums respectively. Finally, we present the Spherical Frustum sparse Convolution Network (SFCNet) to adopt 2D CNNs for LiDAR point cloud semantic segmentation without quantized information loss. Extensive experiments on the SemanticKITTI and nuScenes datasets demonstrate that our SFCNet outperforms the 2D image-based semantic segmentation methods based on conventional spherical projection. The source code will be released later.

Abstract (translated)

LiDAR点云语义分割使得机器人能够获得周围环境的细粒度语义信息。最近,许多工作将点云投影到2D图像上,并使用2D卷积神经网络(CNNs)或视觉Transformer进行LiDAR点云语义分割。然而,由于每个点都可以投影到相同的2D位置,但只有一个点可以被保留,因此以前基于2D图像的分割方法存在不可避免的量化信息损失。为了避免量化信息损失,在本文中,我们提出了一个新的球形骨架结构。在球形骨架中,投影到同一2D位置的点被保留。此外,我们还提出了一个高效率的哈希 based 表示球形骨架。通过哈希表示,我们提出了 Spherical Frustum Sparse Convolution(SFC)和 Frustum Fast Point Sampling(F2PS)分别对存储在球形骨架中的点进行卷积和采样。最后,我们提出了 Spherical Frustum Sparse Convolution Network(SFCNet),用于在没有量化信息损失的情况下使用2D CNN进行LiDAR点云语义分割。在SemanticKITTI和nuScenes数据集上的大量实验证明,我们的SFCNet优于基于传统球形投影的2D图像分割方法。源代码将稍后发布。

URL

https://arxiv.org/abs/2311.17491

PDF

https://arxiv.org/pdf/2311.17491.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot