Paper Reading AI Learner

EchoScene: Indoor Scene Generation via Information Echo over Scene Graph Diffusion

2024-05-02 00:04:02
Guangyao Zhai, Evin Pınar Örnek, Dave Zhenyu Chen, Ruotong Liao, Yan Di, Nassir Navab, Federico Tombari, Benjamin Busam

Abstract

We present EchoScene, an interactive and controllable generative model that generates 3D indoor scenes on scene graphs. EchoScene leverages a dual-branch diffusion model that dynamically adapts to scene graphs. Existing methods struggle to handle scene graphs due to varying numbers of nodes, multiple edge combinations, and manipulator-induced node-edge operations. EchoScene overcomes this by associating each node with a denoising process and enables collaborative information exchange, enhancing controllable and consistent generation aware of global constraints. This is achieved through an information echo scheme in both shape and layout branches. At every denoising step, all processes share their denoising data with an information exchange unit that combines these updates using graph convolution. The scheme ensures that the denoising processes are influenced by a holistic understanding of the scene graph, facilitating the generation of globally coherent scenes. The resulting scenes can be manipulated during inference by editing the input scene graph and sampling the noise in the diffusion model. Extensive experiments validate our approach, which maintains scene controllability and surpasses previous methods in generation fidelity. Moreover, the generated scenes are of high quality and thus directly compatible with off-the-shelf texture generation. Code and trained models are open-sourced.

Abstract (translated)

我们提出了EchoScene,一种交互式且可控制的三维室内场景生成模型,它在场景图上生成3D室内场景。EchoScene利用了双分支扩散模型,该模型动态地适应场景图。由于节点数量、边组合和操纵器诱导的节点边缘操作等因素的不同,现有的方法在处理场景图时遇到困难。EchoScene通过将每个节点与去噪过程相关联,实现了协同信息交流,增强了可控制和一致性的生成,同时考虑了全局约束。这是通过形状和布局分支的信息回声方案实现的。在去噪步骤中,所有进程将去噪数据共享给一个信息交换单元,该单元使用图卷积对这些更新进行组合。该方案确保了去噪过程受到场景图的整体理解的影响,从而促进了全局一致场景的生成。生成的场景在推理过程中可以进行编辑,并从扩散模型的噪声中采样。大量实验验证了我们的方法,保持场景的可控性并超过了前方法在生成质量上的表现。此外,生成的场景具有高质量,因此与现成的纹理生成兼容。代码和训练的模型都是开源的。

URL

https://arxiv.org/abs/2405.00915

PDF

https://arxiv.org/pdf/2405.00915.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot