Abstract
Designing 3D scenes is traditionally a challenging task that demands both artistic expertise and proficiency with complex software. Recent advances in text-to-3D generation have greatly simplified this process by letting users create scenes based on simple text descriptions. However, as these methods generally require extra training or in-context learning, their performance is often hindered by the limited availability of high-quality 3D data. In contrast, modern text-to-image models learned from web-scale images can generate scenes with diverse, reliable spatial layouts and consistent, visually appealing styles. Our key insight is that instead of learning directly from 3D scenes, we can leverage generated 2D images as an intermediary to guide 3D synthesis. In light of this, we introduce ArtiScene, a training-free automated pipeline for scene design that integrates the flexibility of free-form text-to-image generation with the diversity and reliability of 2D intermediary layouts. First, we generate 2D images from a scene description, then extract the shape and appearance of objects to create 3D models. These models are assembled into the final scene using geometry, position, and pose information derived from the same intermediary image. Being generalizable to a wide range of scenes and styles, ArtiScene outperforms state-of-the-art benchmarks by a large margin in layout and aesthetic quality by quantitative metrics. It also averages a 74.89% winning rate in extensive user studies and 95.07% in GPT-4o evaluation. Project page: this https URL
Abstract (translated)
设计三维场景传统上是一项既需要艺术专长又需掌握复杂软件技能的挑战性任务。最近在文本到3D生成领域的进步通过让用户基于简单的文字描述来创建场景,大大简化了这一过程。然而,由于这些方法通常要求额外训练或上下文学习,因此受限于高质量三维数据有限可用性的性能问题仍然存在。相比之下,现代从网络规模图像中学习的文本到图像模型能够产生具有多样性和可靠性空间布局以及一致且视觉吸引人的风格的场景。 我们的关键见解是:与其直接从3D场景进行学习,不如利用生成的2D图像作为中间体来指导3D合成。基于此理念,我们介绍了ArtiScene——一个无需训练的自动化管线,用于场景设计,该管道将自由形式文本到图像生成的灵活性与2D中间布局的多样性和可靠性相结合。 首先,从场景描述中生成2D图像;然后提取对象的形状和外观以创建3D模型。这些模型利用来自同一中间图像的几何、位置和姿态信息进行最终场景组装。ArtiScene能够广泛适用于各种类型的场景和风格,并且在广泛的用户研究中获得了74.89%的胜率,在GPT-4o评估中得到了95.07%的好评。 通过定量指标,ArtiScene在布局和美学质量上大大优于最先进的基准测试。项目页面:[提供链接]
URL
https://arxiv.org/abs/2506.00742