Paper Reading AI Learner

SceneX:Procedural Controllable Large-scale Scene Generation via Large-language Models

2024-03-23 03:23:29
Mengqi Zhou, Jun Hou, Chuanchen Luo, Yuxi Wang, Zhaoxiang Zhang, Junran Peng

Abstract

Due to its great application potential, large-scale scene generation has drawn extensive attention in academia and industry. Recent research employs powerful generative models to create desired scenes and achieves promising results. However, most of these methods represent the scene using 3D primitives (e.g. point cloud or radiance field) incompatible with the industrial pipeline, which leads to a substantial gap between academic research and industrial deployment. Procedural Controllable Generation (PCG) is an efficient technique for creating scalable and high-quality assets, but it is unfriendly for ordinary users as it demands profound domain expertise. To address these issues, we resort to using the large language model (LLM) to drive the procedural modeling. In this paper, we introduce a large-scale scene generation framework, SceneX, which can automatically produce high-quality procedural models according to designers' textual descriptions.Specifically, the proposed method comprises two components, PCGBench and PCGPlanner. The former encompasses an extensive collection of accessible procedural assets and thousands of hand-craft API documents. The latter aims to generate executable actions for Blender to produce controllable and precise 3D assets guided by the user's instructions. Our SceneX can generate a city spanning 2.5 km times 2.5 km with delicate layout and geometric structures, drastically reducing the time cost from several weeks for professional PCG engineers to just a few hours for an ordinary user. Extensive experiments demonstrated the capability of our method in controllable large-scale scene generation and editing, including asset placement and season translation.

Abstract (translated)

由于其在学术界和产业界具有巨大的应用潜力,大规模场景生成已经引起了广泛关注。最近的研究采用强大的生成模型来创建所需场景,并取得了积极的结果。然而,大多数这些方法使用与工业流程不兼容的3D原语(如点云或辐射场)来表示场景,导致学术研究和工业部署之间的差距相当大。 procedural controllable generation (PCG)是一种有效的创建可扩展和高品质资产的技术,但它对普通用户来说并不友好,因为它需要深入的领域专业知识。为解决这些问题,我们转向使用大型语言模型(LLM)驱动程序建模。在本文中,我们介绍了一个大规模场景生成框架SceneX,可以根据设计者的文本描述自动生成高质量程序化模型。具体来说,所提出的方法包括两个组件:PCGBench和PCGPlanner。前者涵盖了广泛的可用程序化资产和数千个手工艺API文档。后者旨在为Blender生成可控制和精确的3D资产,根据用户的指示进行指导。我们的SceneX可以在生成的2.5公里x2.5公里的城市的精细布局和几何结构中生成,大大减少了专业PCG工程师从几周的时间成本降低到普通用户只需几小时的时间成本。 extensive experiments证明了我们在可控的大规模场景生成和编辑方面的能力,包括资产放置和季节翻译。

URL

https://arxiv.org/abs/2403.15698

PDF

https://arxiv.org/pdf/2403.15698.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot