Paper Reading AI Learner

Interactive3D: Create What You Want by Interactive 3D Generation

2024-04-25 11:06:57
Shaocong Dong, Lihe Ding, Zhanpeng Huang, Zibin Wang, Tianfan Xue, Dan Xu

Abstract

3D object generation has undergone significant advancements, yielding high-quality results. However, fall short of achieving precise user control, often yielding results that do not align with user expectations, thus limiting their applicability. User-envisioning 3D object generation faces significant challenges in realizing its concepts using current generative models due to limited interaction capabilities. Existing methods mainly offer two approaches: (i) interpreting textual instructions with constrained controllability, or (ii) reconstructing 3D objects from 2D images. Both of them limit customization to the confines of the 2D reference and potentially introduce undesirable artifacts during the 3D lifting process, restricting the scope for direct and versatile 3D modifications. In this work, we introduce Interactive3D, an innovative framework for interactive 3D generation that grants users precise control over the generative process through extensive 3D interaction capabilities. Interactive3D is constructed in two cascading stages, utilizing distinct 3D representations. The first stage employs Gaussian Splatting for direct user interaction, allowing modifications and guidance of the generative direction at any intermediate step through (i) Adding and Removing components, (ii) Deformable and Rigid Dragging, (iii) Geometric Transformations, and (iv) Semantic Editing. Subsequently, the Gaussian splats are transformed into InstantNGP. We introduce a novel (v) Interactive Hash Refinement module to further add details and extract the geometry in the second stage. Our experiments demonstrate that Interactive3D markedly improves the controllability and quality of 3D generation. Our project webpage is available at \url{this https URL}.

Abstract (translated)

3D对象生成已经取得了显著的进步,产生了高质量的结果。然而,由于缺乏用户控制,通常无法实现精确的用户期望,从而限制了其应用范围。用户可视化3D对象生成面临很大的挑战,因为在目前的生成模型中具有有限的交互能力。现有的方法主要提出了两种方法:(i)通过约束可控制性的文本指令进行解释,或者(ii)从2D图像中重构3D对象。两种方法都限制了对2D参考范围内的定制,并且在3D提升过程中可能引入不良伪影,从而限制了直接和多功能的3D修改范围。在这项工作中,我们引入了Interactive3D,一种创新的交互式3D生成框架,通过广泛的3D交互功能赋予用户对生成过程的精确控制。Interactive3D分为两个级联阶段构建,利用不同的3D表示方法。第一个阶段采用高斯平铺进行直接用户交互,通过(i)添加和移除组件, (ii)可形变和刚体拖拽, (iii)几何变换和(iv)语义编辑来修改和指导生成方向。然后,高斯平铺被转换为InstantNGP。我们引入了一种新颖的(v)交互式哈希平滑模块,以进一步增加细节并提取第二阶段的几何。我们的实验证明,Interactive3D显著提高了3D生成的可控性和质量。我们的项目网页可以通过 \url{这个链接}访问。

URL

https://arxiv.org/abs/2404.16510

PDF

https://arxiv.org/pdf/2404.16510.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot