Paper Reading AI Learner

Stylizing Sparse-View 3D Scenes with Hierarchical Neural Representation

2024-04-08 07:01:42
Y. Wang, A. Gao, Y. Gong, Y. Zeng

Abstract

Recently, a surge of 3D style transfer methods has been proposed that leverage the scene reconstruction power of a pre-trained neural radiance field (NeRF). To successfully stylize a scene this way, one must first reconstruct a photo-realistic radiance field from collected images of the scene. However, when only sparse input views are available, pre-trained few-shot NeRFs often suffer from high-frequency artifacts, which are generated as a by-product of high-frequency details for improving reconstruction quality. Is it possible to generate more faithful stylized scenes from sparse inputs by directly optimizing encoding-based scene representation with target style? In this paper, we consider the stylization of sparse-view scenes in terms of disentangling content semantics and style textures. We propose a coarse-to-fine sparse-view scene stylization framework, where a novel hierarchical encoding-based neural representation is designed to generate high-quality stylized scenes directly from implicit scene representations. We also propose a new optimization strategy with content strength annealing to achieve realistic stylization and better content preservation. Extensive experiments demonstrate that our method can achieve high-quality stylization of sparse-view scenes and outperforms fine-tuning-based baselines in terms of stylization quality and efficiency.

Abstract (translated)

最近,提出了一种利用预训练神经辐射场(NeRF)场景重构能力的3D风格迁移方法。要成功使用这种方法来风格化场景,首先需要从收集的场景图片中重构出照片真实的辐射场。然而,当仅有的输入视图稀疏时,预训练的少样本NeRFs通常会受到高频噪声的影响,这是由于为了提高重建质量而产生的高频细节。是否可以在稀疏输入下通过直接优化基于场景表示的编码来生成更忠实风格的场景呢?在本文中,我们考虑了通过解开内容语义和风格纹理来对稀疏视图场景进行风格化。我们提出了一个粗-细稀疏视图场景风格化框架,其中一种新颖的层次编码基于神经表示旨在直接生成高质量的风格化场景。我们还提出了一个新的优化策略——内容强度衰减,以实现逼真的风格化和更好的内容保留。大量实验证明,我们的方法可以在稀疏视图场景中实现高品质的风格化,并且在 stylization质量和效率方面优于基于微调的基线。

URL

https://arxiv.org/abs/2404.05236

PDF

https://arxiv.org/pdf/2404.05236.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot