MaGRITTe: Manipulative and Generative 3D Realization from Image, Topview and Text

Abstract
Abstract (translated)
URL
PDF

Abstract

The generation of 3D scenes from user-specified conditions offers a promising avenue for alleviating the production burden in 3D applications. Previous studies required significant effort to realize the desired scene, owing to limited control conditions. We propose a method for controlling and generating 3D scenes under multimodal conditions using partial images, layout information represented in the top view, and text prompts. Combining these conditions to generate a 3D scene involves the following significant difficulties: (1) the creation of large datasets, (2) reflection on the interaction of multimodal conditions, and (3) domain dependence of the layout conditions. We decompose the process of 3D scene generation into 2D image generation from the given conditions and 3D scene generation from 2D images. 2D image generation is achieved by fine-tuning a pretrained text-to-image model with a small artificial dataset of partial images and layouts, and 3D scene generation is achieved by layout-conditioned depth estimation and neural radiance fields (NeRF), thereby avoiding the creation of large datasets. The use of a common representation of spatial information using 360-degree images allows for the consideration of multimodal condition interactions and reduces the domain dependence of the layout control. The experimental results qualitatively and quantitatively demonstrated that the proposed method can generate 3D scenes in diverse domains, from indoor to outdoor, according to multimodal conditions.

Abstract (translated)

从用户指定条件生成3D场景为减轻3D应用程序的生产负担提供了有前途的途径。之前的研究因为控制条件有限而需要大量努力来实现期望的场景。我们提出了一种使用部分图像、表示在顶视图中的布局信息以及文本提示控制和生成3D场景的方法。将这些条件组合生成3D场景涉及以下重大困难：（1）创建大量数据集，（2）反思多模态条件的交互，（3）布局条件的领域依赖性。我们将3D场景生成的过程分解为从给定条件的2D图像生成和从2D图像生成的3D场景生成。2D图像生成是通过预训练的文本到图像模型的小人工数据集微调来实现的，而3D场景生成是通过布局条件下的深度估计和神经辐射场（NeRF）实现的，从而避免了创建大量数据集。使用360度图像的常见表示空间信息允许考虑多模态条件的交互，并减少了布局控制的领域依赖性。实验结果既定性地又定量地证明了所提出的方法可以根据多模态条件生成各种领域的3D场景。

URL

https://arxiv.org/abs/2404.00345

PDF

https://arxiv.org/pdf/2404.00345.pdf

MaGRITTe: Manipulative and Generative 3D Realization from Image, Topview and Text

Abstract

Abstract (translated)

URL

PDF Copy

PDF