Paper Reading AI Learner

Transforming Radiance Field with Lipschitz Network for Photorealistic 3D Scene Stylization

2023-03-23 13:05:57
Zicheng Zhang, Yinglu Liu, Congying Han, Yingwei Pan, Tiande Guo, Ting Yao

Abstract

Recent advances in 3D scene representation and novel view synthesis have witnessed the rise of Neural Radiance Fields (NeRFs). Nevertheless, it is not trivial to exploit NeRF for the photorealistic 3D scene stylization task, which aims to generate visually consistent and photorealistic stylized scenes from novel views. Simply coupling NeRF with photorealistic style transfer (PST) will result in cross-view inconsistency and degradation of stylized view syntheses. Through a thorough analysis, we demonstrate that this non-trivial task can be simplified in a new light: When transforming the appearance representation of a pre-trained NeRF with Lipschitz mapping, the consistency and photorealism across source views will be seamlessly encoded into the syntheses. That motivates us to build a concise and flexible learning framework namely LipRF, which upgrades arbitrary 2D PST methods with Lipschitz mapping tailored for the 3D scene. Technically, LipRF first pre-trains a radiance field to reconstruct the 3D scene, and then emulates the style on each view by 2D PST as the prior to learn a Lipschitz network to stylize the pre-trained appearance. In view of that Lipschitz condition highly impacts the expressivity of the neural network, we devise an adaptive regularization to balance the reconstruction and stylization. A gradual gradient aggregation strategy is further introduced to optimize LipRF in a cost-efficient manner. We conduct extensive experiments to show the high quality and robust performance of LipRF on both photorealistic 3D stylization and object appearance editing.

Abstract (translated)

最近的3D场景表示和新视角合成技术的进步见证了神经网络辐射场(NeRF)的崛起。然而,利用NeRF进行逼真3D场景风格化任务仍然是一项艰巨的任务,该任务旨在从新视角生成视觉一致性和逼真风格化的场景。仅仅将NeRF与逼真风格转移(PST)耦合会导致跨视角一致性和风格化合成的退化。通过深入分析,我们证明了这个艰巨的任务可以通过新的视角简化:当对预先训练的NeRF的外观表示进行Lipschitz映射时,源视角的一致性和逼真性将无缝编码到合成中。这激励我们建立名为LipRF的简洁且灵活的学习框架,该框架升级了任意2D PST方法,并针对3D场景进行了Lipschitz映射定制。技术上,LipRF首先预训练一个辐射场来重建3D场景,然后通过2D PST模拟每个视角,作为之前学习的一个Lipschitz网络,学习一种风格化网络以风格化预先训练的外观。鉴于Lipschitz条件高度影响神经网络表达能力,我们设计了一种自适应正则化来平衡重建和风格化。逐渐梯度聚合策略还引入了以高效优化LipRF。我们进行了广泛的实验,以展示LipRF在逼真3D风格化任务和对象外观编辑中的质量和鲁棒性能。

URL

https://arxiv.org/abs/2303.13232

PDF

https://arxiv.org/pdf/2303.13232.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot