Abstract
We present EchoScene, an interactive and controllable generative model that generates 3D indoor scenes on scene graphs. EchoScene leverages a dual-branch diffusion model that dynamically adapts to scene graphs. Existing methods struggle to handle scene graphs due to varying numbers of nodes, multiple edge combinations, and manipulator-induced node-edge operations. EchoScene overcomes this by associating each node with a denoising process and enables collaborative information exchange, enhancing controllable and consistent generation aware of global constraints. This is achieved through an information echo scheme in both shape and layout branches. At every denoising step, all processes share their denoising data with an information exchange unit that combines these updates using graph convolution. The scheme ensures that the denoising processes are influenced by a holistic understanding of the scene graph, facilitating the generation of globally coherent scenes. The resulting scenes can be manipulated during inference by editing the input scene graph and sampling the noise in the diffusion model. Extensive experiments validate our approach, which maintains scene controllability and surpasses previous methods in generation fidelity. Moreover, the generated scenes are of high quality and thus directly compatible with off-the-shelf texture generation. Code and trained models are open-sourced.
Abstract (translated)
我们提出了EchoScene,一种交互式且可控制的三维室内场景生成模型,它在场景图上生成3D室内场景。EchoScene利用了双分支扩散模型,该模型动态地适应场景图。由于节点数量、边组合和操纵器诱导的节点边缘操作等因素的不同,现有的方法在处理场景图时遇到困难。EchoScene通过将每个节点与去噪过程相关联,实现了协同信息交流,增强了可控制和一致性的生成,同时考虑了全局约束。这是通过形状和布局分支的信息回声方案实现的。在去噪步骤中,所有进程将去噪数据共享给一个信息交换单元,该单元使用图卷积对这些更新进行组合。该方案确保了去噪过程受到场景图的整体理解的影响,从而促进了全局一致场景的生成。生成的场景在推理过程中可以进行编辑,并从扩散模型的噪声中采样。大量实验验证了我们的方法,保持场景的可控性并超过了前方法在生成质量上的表现。此外,生成的场景具有高质量,因此与现成的纹理生成兼容。代码和训练的模型都是开源的。
URL
https://arxiv.org/abs/2405.00915