Paper Reading AI Learner

Persistent Nature: A Generative Model of Unbounded 3D Worlds

2023-03-23 17:59:40
Lucy Chai, Richard Tucker, Zhengqi Li, Phillip Isola, Noah Snavely

Abstract

Despite increasingly realistic image quality, recent 3D image generative models often operate on 3D volumes of fixed extent with limited camera motions. We investigate the task of unconditionally synthesizing unbounded nature scenes, enabling arbitrarily large camera motion while maintaining a persistent 3D world model. Our scene representation consists of an extendable, planar scene layout grid, which can be rendered from arbitrary camera poses via a 3D decoder and volume rendering, and a panoramic skydome. Based on this representation, we learn a generative world model solely from single-view internet photos. Our method enables simulating long flights through 3D landscapes, while maintaining global scene consistency--for instance, returning to the starting point yields the same view of the scene. Our approach enables scene extrapolation beyond the fixed bounds of current 3D generative models, while also supporting a persistent, camera-independent world representation that stands in contrast to auto-regressive 3D prediction models. Our project page: this https URL.

Abstract (translated)

尽管图像质量越来越真实,最近的3D图像生成模型通常运行在固定 extent 的3D体积上,且相机运动限制有限。我们研究无条件合成无限制自然场景的任务,以实现任意大的相机运动,同时保持持久的3D世界模型。我们的场景表示包括一个可扩展的平面场景布局网格,可以通过任意相机姿态通过3D解码和体积渲染进行渲染,并创建一个全景的天空穹顶。基于这个表示,我们仅从单视图互联网照片学习了一个生成世界模型。我们的方法可以实现模拟穿越3D地形的旅程,同时保持全局场景一致性——例如,回到起点得到相同的场景视图。我们的方法和传统的3D生成模型的固定边界相比,支持持久的相机无关的世界表示。我们的项目页面:这个 https URL。

URL

https://arxiv.org/abs/2303.13515

PDF

https://arxiv.org/pdf/2303.13515.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot