Paper Reading AI Learner

SemCity: Semantic Scene Generation with Triplane Diffusion

2024-03-12 15:59:08
Jumin Lee, Sebin Lee, Changho Jo, Woobin Im, Juhyeong Seon, Sung-Eui Yoon

Abstract

We present "SemCity," a 3D diffusion model for semantic scene generation in real-world outdoor environments. Most 3D diffusion models focus on generating a single object, synthetic indoor scenes, or synthetic outdoor scenes, while the generation of real-world outdoor scenes is rarely addressed. In this paper, we concentrate on generating a real-outdoor scene through learning a diffusion model on a real-world outdoor dataset. In contrast to synthetic data, real-outdoor datasets often contain more empty spaces due to sensor limitations, causing challenges in learning real-outdoor distributions. To address this issue, we exploit a triplane representation as a proxy form of scene distributions to be learned by our diffusion model. Furthermore, we propose a triplane manipulation that integrates seamlessly with our triplane diffusion model. The manipulation improves our diffusion model's applicability in a variety of downstream tasks related to outdoor scene generation such as scene inpainting, scene outpainting, and semantic scene completion refinements. In experimental results, we demonstrate that our triplane diffusion model shows meaningful generation results compared with existing work in a real-outdoor dataset, SemanticKITTI. We also show our triplane manipulation facilitates seamlessly adding, removing, or modifying objects within a scene. Further, it also enables the expansion of scenes toward a city-level scale. Finally, we evaluate our method on semantic scene completion refinements where our diffusion model enhances predictions of semantic scene completion networks by learning scene distribution. Our code is available at this https URL.

Abstract (translated)

我们提出了一个名为“SemCity”的3D扩散模型,用于在现实世界户外环境中生成语义场景。大多数3D扩散模型集中于生成单个物体、合成室内场景或合成室外场景,而现实世界户外场景的生成很少被关注。在本文中,我们专注于通过在现实世界户外数据集中学习扩散模型来生成真实户外场景。与合成数据相比,现实世界户外数据集通常包含更多的空旷空间,导致学习真实户外分布具有挑战性。为了解决这个问题,我们利用三平面表示作为一种场景分布的代理形式,作为我们的扩散模型可以学习的三平面操作。此外,我们还提出了一种与三平面扩散模型无缝集成的三平面操作。操作改善了我们的扩散模型在户外场景生成任务中的适用性,例如场景修复、场景去修复和语义场景完成 refinements。在实验结果中,我们证明了我们的三平面扩散模型在真实户外数据集上的生成结果与现有工作相比具有实际意义,即使在语义KITTI数据集上也是如此。我们还证明了我们的三平面操作使场景内对象在不同场景之间的添加、删除或修改变得更加容易。此外,它还使场景可以扩展到城市级别。最后,我们在语义场景完成 refinements 上评估我们的方法,我们的扩散模型通过学习场景分布增强了语义场景完成网络的预测。我们的代码可在此处访问:https://www.xxxxxx.com/

URL

https://arxiv.org/abs/2403.07773

PDF

https://arxiv.org/pdf/2403.07773.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot