Abstract
LiDAR point cloud semantic segmentation enables the robots to obtain fine-grained semantic information of the surrounding environment. Recently, many works project the point cloud onto the 2D image and adopt the 2D Convolutional Neural Networks (CNNs) or vision transformer for LiDAR point cloud semantic segmentation. However, since more than one point can be projected onto the same 2D position but only one point can be preserved, the previous 2D image-based segmentation methods suffer from inevitable quantized information loss. To avoid quantized information loss, in this paper, we propose a novel spherical frustum structure. The points projected onto the same 2D position are preserved in the spherical frustums. Moreover, we propose a memory-efficient hash-based representation of spherical frustums. Through the hash-based representation, we propose the Spherical Frustum sparse Convolution (SFC) and Frustum Fast Point Sampling (F2PS) to convolve and sample the points stored in spherical frustums respectively. Finally, we present the Spherical Frustum sparse Convolution Network (SFCNet) to adopt 2D CNNs for LiDAR point cloud semantic segmentation without quantized information loss. Extensive experiments on the SemanticKITTI and nuScenes datasets demonstrate that our SFCNet outperforms the 2D image-based semantic segmentation methods based on conventional spherical projection. The source code will be released later.
Abstract (translated)
LiDAR点云语义分割使得机器人能够获得周围环境的细粒度语义信息。最近,许多工作将点云投影到2D图像上,并使用2D卷积神经网络(CNNs)或视觉Transformer进行LiDAR点云语义分割。然而,由于每个点都可以投影到相同的2D位置,但只有一个点可以被保留,因此以前基于2D图像的分割方法存在不可避免的量化信息损失。为了避免量化信息损失,在本文中,我们提出了一个新的球形骨架结构。在球形骨架中,投影到同一2D位置的点被保留。此外,我们还提出了一个高效率的哈希 based 表示球形骨架。通过哈希表示,我们提出了 Spherical Frustum Sparse Convolution(SFC)和 Frustum Fast Point Sampling(F2PS)分别对存储在球形骨架中的点进行卷积和采样。最后,我们提出了 Spherical Frustum Sparse Convolution Network(SFCNet),用于在没有量化信息损失的情况下使用2D CNN进行LiDAR点云语义分割。在SemanticKITTI和nuScenes数据集上的大量实验证明,我们的SFCNet优于基于传统球形投影的2D图像分割方法。源代码将稍后发布。
URL
https://arxiv.org/abs/2311.17491