Paper Reading AI Learner

Fast LiDAR Upsampling using Conditional Diffusion Models

2024-05-08 08:38:28
Sander Elias Magnussen Helgesen, Kazuto Nakashima, Jim T{\o}rresen, Ryo Kurazume


The search for refining 3D LiDAR data has attracted growing interest motivated by recent techniques such as supervised learning or generative model-based methods. Existing approaches have shown the possibilities for using diffusion models to generate refined LiDAR data with high fidelity, although the performance and speed of such methods have been limited. These limitations make it difficult to execute in real-time, causing the approaches to struggle in real-world tasks such as autonomous navigation and human-robot interaction. In this work, we introduce a novel approach based on conditional diffusion models for fast and high-quality sparse-to-dense upsampling of 3D scene point clouds through an image representation. Our method employs denoising diffusion probabilistic models trained with conditional inpainting masks, which have been shown to give high performance on image completion tasks. We introduce a series of experiments, including multiple datasets, sampling steps, and conditional masks, to determine the ideal configuration, striking a balance between performance and inference speed. This paper illustrates that our method outperforms the baselines in sampling speed and quality on upsampling tasks using the KITTI-360 dataset. Furthermore, we illustrate the generalization ability of our approach by simultaneously training on real-world and synthetic datasets, introducing variance in quality and environments.

Abstract (translated)

寻找精化3D LiDAR数据的搜索吸引了越来越多的关注,这是由最近使用的如监督学习或基于生成模型的方法等技术引起的。虽然已经证明了使用扩散模型生成具有高保真度的精化LiDAR数据的可能性,但这种方法的性能和速度仍然有限。这些限制使得在实时执行中很难实现,导致在现实世界的任务(如自主导航和人类机器人交互)中,这些方法遇到困难。 在本文中,我们介绍了一种基于条件扩散模型的新的方法,用于通过图像表示对3D场景点云进行高保真度的平滑和压缩。我们的方法采用带条件修补掩码的噪声扩散概率模型进行训练,这些模型已经在图像完成任务中表现出良好的性能。我们介绍了一系列实验,包括多个数据集、采样步骤和条件掩码,以确定理想的配置,在性能和推理速度之间取得平衡。本文证明了,我们的方法在KITTI-360数据集上的采样速度和质量方面超过了基线。此外,我们还通过同时在一手真实世界和合成数据上训练,展示了我们方法的一般化能力。我们还展示了在不同质量和环境下的平滑和压缩效果。



3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot