Paper Reading AI Learner

InpaintNeRF360: Text-Guided 3D Inpainting on Unbounded Neural Radiance Fields

2023-05-24 12:22:23
Dongqing Wang, Tong Zhang, Alaa Abboud, Sabine Süsstrunk

Abstract

Neural Radiance Fields (NeRF) can generate highly realistic novel views. However, editing 3D scenes represented by NeRF across 360-degree views, particularly removing objects while preserving geometric and photometric consistency, remains a challenging problem due to NeRF's implicit scene representation. In this paper, we propose InpaintNeRF360, a unified framework that utilizes natural language instructions as guidance for inpainting NeRF-based 3D scenes.Our approach employs a promptable segmentation model by generating multi-modal prompts from the encoded text for multiview segmentation. We apply depth-space warping to enforce viewing consistency in the segmentations, and further refine the inpainted NeRF model using perceptual priors to ensure visual plausibility. InpaintNeRF360 is capable of simultaneously removing multiple objects or modifying object appearance based on text instructions while synthesizing 3D viewing-consistent and photo-realistic inpainting. Through extensive experiments on both unbounded and frontal-facing scenes trained through NeRF, we demonstrate the effectiveness of our approach and showcase its potential to enhance the editability of implicit radiance fields.

Abstract (translated)

神经网络辐射场(NeRF)可以生成高度真实的新视角。然而,编辑由NeRF代表的三维场景 across 360-度视图,特别是同时保留几何和色彩一致性,但由于NeRF的隐含场景表示是一个挑战性的问题。在本文中,我们提出了InpaintNeRF360,一个统一框架,利用自然语言指令为基于NeRF的三维场景进行填充。我们采用一种可prompt的分割模型,从编码文本中提取多模态提示,用于多视角分割。我们应用深度空间扭曲来强制分割视角一致性,并使用感觉先验来进一步 refine填充的NeRF模型,以确保视觉可行性。InpaintNeRF360能够同时删除多个对象或修改对象外观,基于文本指令,同时合成360度视角一致和 photo-realistic的填充。通过训练通过NeRF训练的无边界和正面场景,我们证明了我们的方法和其有效性,并展示了增强隐含辐射场编辑能力的潜力。

URL

https://arxiv.org/abs/2305.15094

PDF

https://arxiv.org/pdf/2305.15094.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot