Abstract
3D scene generation seeks to synthesize spatially structured, semantically meaningful, and photorealistic environments for applications such as immersive media, robotics, autonomous driving, and embodied AI. Early methods based on procedural rules offered scalability but limited diversity. Recent advances in deep generative models (e.g., GANs, diffusion models) and 3D representations (e.g., NeRF, 3D Gaussians) have enabled the learning of real-world scene distributions, improving fidelity, diversity, and view consistency. Recent advances like diffusion models bridge 3D scene synthesis and photorealism by reframing generation as image or video synthesis problems. This survey provides a systematic overview of state-of-the-art approaches, organizing them into four paradigms: procedural generation, neural 3D-based generation, image-based generation, and video-based generation. We analyze their technical foundations, trade-offs, and representative results, and review commonly used datasets, evaluation protocols, and downstream applications. We conclude by discussing key challenges in generation capacity, 3D representation, data and annotations, and evaluation, and outline promising directions including higher fidelity, physics-aware and interactive generation, and unified perception-generation models. This review organizes recent advances in 3D scene generation and highlights promising directions at the intersection of generative AI, 3D vision, and embodied intelligence. To track ongoing developments, we maintain an up-to-date project page: this https URL.
Abstract (translated)
三维场景生成旨在为沉浸式媒体、机器人技术、自动驾驶和具身人工智能等应用合成具有空间结构化、语义意义且逼真的环境。早期基于程序规则的方法虽然具备可扩展性,但多样性有限。近年来,深度生成模型(如GANs、扩散模型)以及3D表示方法(如NeRF、3D高斯分布)的进步使得能够学习真实场景的分布,从而提高了逼真度、多样性和视角一致性。最近的技术进步,比如扩散模型通过将生成问题重新定义为图像或视频合成问题的方式,成功地连接了三维场景生成与照片级真实性。本综述系统性地概述了当前最先进的方法,并将其分类为四大范式:程序化生成、基于神经3D的生成、基于图像的生成和基于视频的生成。我们分析了它们的技术基础、权衡以及代表性结果,还回顾了一些常用的数据库、评估协议及下游应用。最后,讨论了在生成能力、三维表示、数据与注释以及评估方面的关键挑战,并概述了一系列有前景的方向,包括更高的保真度、物理感知和互动生成,以及统一的感知-生成模型。这篇综述整理了最近在三维场景生成领域的进展,并突出了生成AI、三维视觉和具身智能交叉领域中的潜在发展方向。为了跟踪正在进行的发展,我们维护了一个最新的项目页面:[此URL](this https URL)。
URL
https://arxiv.org/abs/2505.05474