Abstract
Leveraging recent diffusion models, LiDAR-based large-scale 3D scene generation has achieved great success. While recent voxel-based approaches can generate both geometric structures and semantic labels, existing range-view methods are limited to producing unlabeled LiDAR scenes. Relying on pretrained segmentation models to predict the semantic maps often results in suboptimal cross-modal consistency. To address this limitation while preserving the advantages of range-view representations, such as computational efficiency and simplified network design, we propose Spiral, a novel range-view LiDAR diffusion model that simultaneously generates depth, reflectance images, and semantic maps. Furthermore, we introduce novel semantic-aware metrics to evaluate the quality of the generated labeled range-view data. Experiments on the SemanticKITTI and nuScenes datasets demonstrate that Spiral achieves state-of-the-art performance with the smallest parameter size, outperforming two-step methods that combine the generative and segmentation models. Additionally, we validate that range images generated by Spiral can be effectively used for synthetic data augmentation in the downstream segmentation training, significantly reducing the labeling effort on LiDAR data.
Abstract (translated)
利用最近的扩散模型,基于LiDAR的大规模3D场景生成取得了显著的成功。尽管近期基于体素的方法可以同时生成几何结构和语义标签,现有的范围视图方法仅限于产生未标记的LiDAR场景。依赖预训练分割模型来预测语义地图通常会导致跨模态一致性较差的结果。为了在保留范围视图表示的优点(如计算效率和简化网络设计)的同时解决这一局限性,我们提出了Spiral,这是一种新颖的范围视图LiDAR扩散模型,可以同时生成深度图像、反射率图像和语义地图。此外,我们引入了新的基于语义的度量标准来评估生成的带标签的范围视图数据的质量。在SemanticKITTI和nuScenes数据集上的实验表明,Spiral实现了最先进的性能,并且参数规模最小,在结合生成模型和分割模型的两步方法中表现更佳。此外,我们验证了由Spiral生成的范围图像可以有效地用于下游分割训练中的合成数据增强,从而显著减少了对LiDAR数据的手动标注工作。
URL
https://arxiv.org/abs/2505.22643