Paper Reading AI Learner

Towards Realistic Scene Generation with LiDAR Diffusion Models

2024-03-31 22:18:56
Haoxi Ran, Vitor Guizilini, Yue Wang

Abstract

Diffusion models (DMs) excel in photo-realistic image synthesis, but their adaptation to LiDAR scene generation poses a substantial hurdle. This is primarily because DMs operating in the point space struggle to preserve the curve-like patterns and 3D geometry of LiDAR scenes, which consumes much of their representation power. In this paper, we propose LiDAR Diffusion Models (LiDMs) to generate LiDAR-realistic scenes from a latent space tailored to capture the realism of LiDAR scenes by incorporating geometric priors into the learning pipeline. Our method targets three major desiderata: pattern realism, geometry realism, and object realism. Specifically, we introduce curve-wise compression to simulate real-world LiDAR patterns, point-wise coordinate supervision to learn scene geometry, and patch-wise encoding for a full 3D object context. With these three core designs, our method achieves competitive performance on unconditional LiDAR generation in 64-beam scenario and state of the art on conditional LiDAR generation, while maintaining high efficiency compared to point-based DMs (up to 107$\times$ faster). Furthermore, by compressing LiDAR scenes into a latent space, we enable the controllability of DMs with various conditions such as semantic maps, camera views, and text prompts. Our code and pretrained weights are available at this https URL.

Abstract (translated)

扩散模型(DMs)在照片现实图像合成方面表现出色,但将它们应用于激光雷达(LiDAR)场景生成存在重大挑战。这主要是因为在点空间操作的DM很难保留LiDAR场景的曲线状模式和3D几何,这会消耗它们的代表力。在本文中,我们提出了LiDAR扩散模型(LiDMs)来从针对LiDAR场景的潜在空间中生成LiDAR真实场景,通过将几何先验集成到学习管道中来实现这一目标。我们的方法针对三个主要需求:模式现实、几何现实和物体现实。具体来说,我们引入了曲线级压缩来模拟真实世界LiDAR模式,点级坐标监督来学习场景几何,以及完整的3D对象编码来提供对3D对象的全面信息。 通过这三个核心设计,我们的方法在64束场景下的无条件LiDAR生成上实现了与条件LiDAR生成相匹敌的竞争性能,同时在条件LiDAR生成方面达到了最先进的水平,同时保持了与基于点的DM相比高达107倍的高效率。此外,通过将LiDAR场景压缩到潜在空间中,我们使得DM具有各种条件下的可控性,如语义图、相机视图和文本提示。 我们的代码和预训练权重可以从以下链接获取:https://www.xxx.com/。

URL

https://arxiv.org/abs/2404.00815

PDF

https://arxiv.org/pdf/2404.00815.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot