Abstract
Text-to-3D scene generation holds immense potential for the gaming, film, and architecture sectors. Despite significant progress, existing methods struggle with maintaining high quality, consistency, and editing flexibility. In this paper, we propose DreamScene, a 3D Gaussian-based novel text-to-3D scene generation framework, to tackle the aforementioned three challenges mainly via two strategies. First, DreamScene employs Formation Pattern Sampling (FPS), a multi-timestep sampling strategy guided by the formation patterns of 3D objects, to form fast, semantically rich, and high-quality representations. FPS uses 3D Gaussian filtering for optimization stability, and leverages reconstruction techniques to generate plausible textures. Second, DreamScene employs a progressive three-stage camera sampling strategy, specifically designed for both indoor and outdoor settings, to effectively ensure object-environment integration and scene-wide 3D consistency. Last, DreamScene enhances scene editing flexibility by integrating objects and environments, enabling targeted adjustments. Extensive experiments validate DreamScene's superiority over current state-of-the-art techniques, heralding its wide-ranging potential for diverse applications. Code and demos will be released at this https URL .
Abstract (translated)
文本到3D场景生成在游戏、电影和建筑领域具有巨大的潜力。尽管已经取得了一定的进展,但现有的方法在保持高质量、一致性和编辑灵活性方面仍然存在挑战。在本文中,我们提出了DreamScene,一种基于3D高斯的新文本到3D场景生成框架,通过两种策略来解决上述三个挑战。首先,DreamScene采用形成模式采样(FPS),一种由3D物体形成模式指导的多时间步采样策略,以形成快速、语义丰富、高质量的代表。FPS使用3D高斯滤波器优化稳定性,并利用重构技术生成逼真的纹理。其次,DreamScene采用一种渐进式的三阶段相机采样策略,特别为室内和室外场景设计,以有效确保物体环境和场景范围的3D一致性。最后,DreamScene通过将物体和环境集成,增强了场景编辑的灵活性,实现了针对性的调整。大量实验证实了DreamScene在现有技术水平上的优越性,预示着其在各种应用领域广泛的潜力。代码和演示将在此处发布:https://www.dreamscene.org 。
URL
https://arxiv.org/abs/2404.03575