Paper Reading AI Learner

GaussianBlock: Building Part-Aware Compositional and Editable 3D Scene by Primitives and Gaussians

2024-10-02 13:26:28
Shuyi Jiang, Qihao Zhao, Hossein Rahmani, De Wen Soh, Jun Liu, Na Zhao

Abstract

Recently, with the development of Neural Radiance Fields and Gaussian Splatting, 3D reconstruction techniques have achieved remarkably high fidelity. However, the latent representations learnt by these methods are highly entangled and lack interpretability. In this paper, we propose a novel part-aware compositional reconstruction method, called GaussianBlock, that enables semantically coherent and disentangled representations, allowing for precise and physical editing akin to building blocks, while simultaneously maintaining high fidelity. Our GaussianBlock introduces a hybrid representation that leverages the advantages of both primitives, known for their flexible actionability and editability, and 3D Gaussians, which excel in reconstruction quality. Specifically, we achieve semantically coherent primitives through a novel attention-guided centering loss derived from 2D semantic priors, complemented by a dynamic splitting and fusion strategy. Furthermore, we utilize 3D Gaussians that hybridize with primitives to refine structural details and enhance fidelity. Additionally, a binding inheritance strategy is employed to strengthen and maintain the connection between the two. Our reconstructed scenes are evidenced to be disentangled, compositional, and compact across diverse benchmarks, enabling seamless, direct and precise editing while maintaining high quality.

Abstract (translated)

近年来,随着神经元辐射场和Gaussian合成的发展,3D 重建技术取得了显著的高保真度。然而,这些方法学所学习的潜在表示高度纠缠且缺乏可解释性。在本文中,我们提出了一种新颖的部分感知组合重建方法,称为GaussianBlock,该方法实现了语义上可解释和去中心化的表示,允许用户精确地编辑和物理地编辑类似于构建块,同时保持高保真度。 我们的GaussianBlock引入了一种结合原始数据和3D高斯信息的混合表示,这是通过对2D语义 prior 的自适应加权获得的。此外,我们采用了一种动态的分裂和合并策略,以实现语义上可解释的原始表示。此外,我们还利用与原始数据混合的3D高斯来优化结构和增强保真度。 此外,我们还采用了一种绑定继承策略来加强和维持这两个之间的联系。通过在各种基准测试中进行验证,我们的重构场景被证明具有可解释性、可组合性和紧凑性,这使得用户可以在不降低质量的情况下实现直接、精确和无缝的编辑。

URL

https://arxiv.org/abs/2410.01535

PDF

https://arxiv.org/pdf/2410.01535.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot