Paper Reading AI Learner

DreamScene: 3D Gaussian-based Text-to-3D Scene Generation via Formation Pattern Sampling

2024-04-04 16:38:57
Haoran Li, Haolin Shi, Wenli Zhang, Wenjun Wu, Yong Liao, Lin Wang, Lik-hang Lee, Pengyuan Zhou

Abstract

Text-to-3D scene generation holds immense potential for the gaming, film, and architecture sectors. Despite significant progress, existing methods struggle with maintaining high quality, consistency, and editing flexibility. In this paper, we propose DreamScene, a 3D Gaussian-based novel text-to-3D scene generation framework, to tackle the aforementioned three challenges mainly via two strategies. First, DreamScene employs Formation Pattern Sampling (FPS), a multi-timestep sampling strategy guided by the formation patterns of 3D objects, to form fast, semantically rich, and high-quality representations. FPS uses 3D Gaussian filtering for optimization stability, and leverages reconstruction techniques to generate plausible textures. Second, DreamScene employs a progressive three-stage camera sampling strategy, specifically designed for both indoor and outdoor settings, to effectively ensure object-environment integration and scene-wide 3D consistency. Last, DreamScene enhances scene editing flexibility by integrating objects and environments, enabling targeted adjustments. Extensive experiments validate DreamScene's superiority over current state-of-the-art techniques, heralding its wide-ranging potential for diverse applications. Code and demos will be released at this https URL .

Abstract (translated)

文本到3D场景生成在游戏、电影和建筑领域具有巨大的潜力。尽管已经取得了一定的进展,但现有的方法在保持高质量、一致性和编辑灵活性方面仍然存在挑战。在本文中,我们提出了DreamScene,一种基于3D高斯的新文本到3D场景生成框架,通过两种策略来解决上述三个挑战。首先,DreamScene采用形成模式采样(FPS),一种由3D物体形成模式指导的多时间步采样策略,以形成快速、语义丰富、高质量的代表。FPS使用3D高斯滤波器优化稳定性,并利用重构技术生成逼真的纹理。其次,DreamScene采用一种渐进式的三阶段相机采样策略,特别为室内和室外场景设计,以有效确保物体环境和场景范围的3D一致性。最后,DreamScene通过将物体和环境集成,增强了场景编辑的灵活性,实现了针对性的调整。大量实验证实了DreamScene在现有技术水平上的优越性,预示着其在各种应用领域广泛的潜力。代码和演示将在此处发布:https://www.dreamscene.org 。

URL

https://arxiv.org/abs/2404.03575

PDF

https://arxiv.org/pdf/2404.03575.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot