Paper Reading AI Learner

GenN2N: Generative NeRF2NeRF Translation

2024-04-03 14:56:06
Xiangyue Liu, Han Xue, Kunming Luo, Ping Tan, Li Yi

Abstract

We present GenN2N, a unified NeRF-to-NeRF translation framework for various NeRF translation tasks such as text-driven NeRF editing, colorization, super-resolution, inpainting, etc. Unlike previous methods designed for individual translation tasks with task-specific schemes, GenN2N achieves all these NeRF editing tasks by employing a plug-and-play image-to-image translator to perform editing in the 2D domain and lifting 2D edits into the 3D NeRF space. Since the 3D consistency of 2D edits may not be assured, we propose to model the distribution of the underlying 3D edits through a generative model that can cover all possible edited NeRFs. To model the distribution of 3D edited NeRFs from 2D edited images, we carefully design a VAE-GAN that encodes images while decoding NeRFs. The latent space is trained to align with a Gaussian distribution and the NeRFs are supervised through an adversarial loss on its renderings. To ensure the latent code does not depend on 2D viewpoints but truly reflects the 3D edits, we also regularize the latent code through a contrastive learning scheme. Extensive experiments on various editing tasks show GenN2N, as a universal framework, performs as well or better than task-specific specialists while possessing flexible generative power. More results on our project page: this https URL

Abstract (translated)

我们提出了GenN2N,一个统一的NeRF到NeRF翻译框架,用于各种NeRF翻译任务,如文本驱动的NeRF编辑、颜色化、超分辨率等。与之前为单独任务设计的具有任务特定方案的方法不同,GenN2N通过使用可插拔的图像到图像的翻译器在二维领域执行编辑并将2D编辑浮动到三维NeRF空间中,从而实现所有这些NeRF编辑任务。由于二维编辑的3D一致性可能无法保证,我们提出通过一个生成模型建模底层3D编辑的分布。为了从2D编辑图像中建模3D编辑的分布,我们仔细设计了一个VAE-GAN,它在解码NeRF的同时编码图像。隐空间通过归一化高斯分布进行训练,NeRFs通过在其渲染上应用对抗损失进行监督。为了确保隐码不依赖于2D视点,而是真正反映了3D编辑,我们还通过对比学习方案对隐码进行正则化。在各种编辑任务上的广泛实验表明,GenN2N作为一个通用框架,表现出色或者与任务特定专家相当,同时具有灵活的生成能力。更多结果请查看我们的项目页面:https:// this URL。

URL

https://arxiv.org/abs/2404.02788

PDF

https://arxiv.org/pdf/2404.02788.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot