Paper Reading AI Learner

SceneDreamer: Unbounded 3D Scene Generation from 2D Image Collections

2023-02-02 18:59:16
Zhaoxi Chen, Guangcong Wang, Ziwei Liu

Abstract

In this work, we present SceneDreamer, an unconditional generative model for unbounded 3D scenes, which synthesizes large-scale 3D landscapes from random noises. Our framework is learned from in-the-wild 2D image collections only, without any 3D annotations. At the core of SceneDreamer is a principled learning paradigm comprising 1) an efficient yet expressive 3D scene representation, 2) a generative scene parameterization, and 3) an effective renderer that can leverage the knowledge from 2D images. Our framework starts from an efficient bird's-eye-view (BEV) representation generated from simplex noise, which consists of a height field and a semantic field. The height field represents the surface elevation of 3D scenes, while the semantic field provides detailed scene semantics. This BEV scene representation enables 1) representing a 3D scene with quadratic complexity, 2) disentangled geometry and semantics, and 3) efficient training. Furthermore, we propose a novel generative neural hash grid to parameterize the latent space given 3D positions and the scene semantics, which aims to encode generalizable features across scenes. Lastly, a neural volumetric renderer, learned from 2D image collections through adversarial training, is employed to produce photorealistic images. Extensive experiments demonstrate the effectiveness of SceneDreamer and superiority over state-of-the-art methods in generating vivid yet diverse unbounded 3D worlds.

Abstract (translated)

在本作品中,我们提出了SceneDreamer,一个无条件生成模型,用于生成无限制的三维场景。该模型从随机噪声中合成大规模的三维地形。我们的框架仅从野生的2D图像集学习,没有任何3D注释。SceneDreamer的核心是一种有原则的学习范式,包括1)高效但表达丰富的3D场景表示,2)生成场景参数化,3)可以利用2D图像知识的有效渲染器。我们的框架从简单的单源噪声生成高效的俯瞰视图(BEV)表示,该表示由高度场和语义场组成。高度场表示3D场景的表面高度,而语义场提供详细的场景语义。这种BEV场景表示可以1)代表具有平方复杂度的3D场景,2)分离几何和语义,3)高效训练。此外,我们提出了一种新的生成神经网络哈希网格,以参数化给定3D位置和场景语义的隐含空间,旨在编码跨场景通用的特征。最后,通过对抗训练从2D图像集学习到的神经网络体积渲染器被用于生成逼真的图像。广泛的实验结果表明SceneDreamer的有效性和在生成丰富但多样性无限的三维世界中的优势。

URL

https://arxiv.org/abs/2302.01330

PDF

https://arxiv.org/pdf/2302.01330.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot