Paper Reading AI Learner

Recent Advance in 3D Object and Scene Generation: A Survey

2025-04-16 03:22:06
Xiang Tang, Ruotong Li, Xiaopeng Fan

Abstract

In recent years, the demand for 3D content has grown exponentially with intelligent upgrading of interactive media, extended reality (XR), and Metaverse industries. In order to overcome the limitation of traditional manual modeling approaches, such as labor-intensive workflows and prolonged production cycles, revolutionary advances have been achieved through the convergence of novel 3D representation paradigms and artificial intelligence generative technologies. In this survey, we conduct a systematically review of the cutting-edge achievements in static 3D object and scene generation, as well as establish a comprehensive technical framework through systematic categorization. Specifically, we initiate our analysis with mainstream 3D object representations, followed by in-depth exploration of two principal technical pathways in object generation: data-driven supervised learning methods and deep generative model-based approaches. Regarding scene generation, we focus on three dominant paradigms: layout-guided compositional synthesis, 2D prior-based scene generation, and rule-driven modeling. Finally, we critically examine persistent challenges in 3D generation and propose potential research directions for future investigation. This survey aims to provide readers with a structured understanding of state-of-the-art 3D generation technologies while inspiring researchers to undertake more exploration in this domain.

Abstract (translated)

近年来,随着交互式媒体、扩展现实(XR)和元宇宙行业的智能化升级,对三维内容的需求呈指数级增长。为了克服传统手动建模方法的限制,如劳动密集型工作流程和延长的生产周期,通过新型三维表示范例与人工智能生成技术的融合,取得了革命性的进展。在本次综述中,我们系统地回顾了静态3D物体和场景生成领域的最新成就,并通过系统的分类建立了全面的技术框架。具体而言,我们的分析始于主流的3D对象表示形式,随后深入探讨两个主要的对象生成技术路径:数据驱动监督学习方法和基于深度生成模型的方法。对于场景生成,我们重点关注三个主导范式:布局引导组合合成、基于2D先验的场景生成以及规则驱动建模。最后,我们批判性地审视了3D生成中持续存在的挑战,并提出了未来研究方向的可能性。本次综述旨在为读者提供对最新3D生成技术结构化理解的同时,激励研究人员在此领域进行更多探索。

URL

https://arxiv.org/abs/2504.11734

PDF

https://arxiv.org/pdf/2504.11734.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot