SceneDreamer: Unbounded 3D Scene Generation from 2D Image Collections

Abstract
Abstract (translated)
URL
PDF

Abstract

In this work, we present SceneDreamer, an unconditional generative model for unbounded 3D scenes, which synthesizes large-scale 3D landscapes from random noises. Our framework is learned from in-the-wild 2D image collections only, without any 3D annotations. At the core of SceneDreamer is a principled learning paradigm comprising 1) an efficient yet expressive 3D scene representation, 2) a generative scene parameterization, and 3) an effective renderer that can leverage the knowledge from 2D images. Our framework starts from an efficient bird's-eye-view (BEV) representation generated from simplex noise, which consists of a height field and a semantic field. The height field represents the surface elevation of 3D scenes, while the semantic field provides detailed scene semantics. This BEV scene representation enables 1) representing a 3D scene with quadratic complexity, 2) disentangled geometry and semantics, and 3) efficient training. Furthermore, we propose a novel generative neural hash grid to parameterize the latent space given 3D positions and the scene semantics, which aims to encode generalizable features across scenes. Lastly, a neural volumetric renderer, learned from 2D image collections through adversarial training, is employed to produce photorealistic images. Extensive experiments demonstrate the effectiveness of SceneDreamer and superiority over state-of-the-art methods in generating vivid yet diverse unbounded 3D worlds.

Abstract (translated)

在本作品中,我们提出了SceneDreamer,一个无条件生成模型,用于生成无限制的三维场景。该模型从随机噪声中合成大规模的三维地形。我们的框架仅从野生的2D图像集学习,没有任何3D注释。SceneDreamer的核心是一种有原则的学习范式,包括1)高效但表达丰富的3D场景表示,2)生成场景参数化,3)可以利用2D图像知识的有效渲染器。我们的框架从简单的单源噪声生成高效的俯瞰视图(BEV)表示,该表示由高度场和语义场组成。高度场表示3D场景的表面高度,而语义场提供详细的场景语义。这种BEV场景表示可以1)代表具有平方复杂度的3D场景,2)分离几何和语义,3)高效训练。此外,我们提出了一种新的生成神经网络哈希网格,以参数化给定3D位置和场景语义的隐含空间,旨在编码跨场景通用的特征。最后,通过对抗训练从2D图像集学习到的神经网络体积渲染器被用于生成逼真的图像。广泛的实验结果表明SceneDreamer的有效性和在生成丰富但多样性无限的三维世界中的优势。

URL

https://arxiv.org/abs/2302.01330

PDF

https://arxiv.org/pdf/2302.01330.pdf