Abstract
Recently, a surge of 3D style transfer methods has been proposed that leverage the scene reconstruction power of a pre-trained neural radiance field (NeRF). To successfully stylize a scene this way, one must first reconstruct a photo-realistic radiance field from collected images of the scene. However, when only sparse input views are available, pre-trained few-shot NeRFs often suffer from high-frequency artifacts, which are generated as a by-product of high-frequency details for improving reconstruction quality. Is it possible to generate more faithful stylized scenes from sparse inputs by directly optimizing encoding-based scene representation with target style? In this paper, we consider the stylization of sparse-view scenes in terms of disentangling content semantics and style textures. We propose a coarse-to-fine sparse-view scene stylization framework, where a novel hierarchical encoding-based neural representation is designed to generate high-quality stylized scenes directly from implicit scene representations. We also propose a new optimization strategy with content strength annealing to achieve realistic stylization and better content preservation. Extensive experiments demonstrate that our method can achieve high-quality stylization of sparse-view scenes and outperforms fine-tuning-based baselines in terms of stylization quality and efficiency.
Abstract (translated)
最近,提出了一种利用预训练神经辐射场(NeRF)场景重构能力的3D风格迁移方法。要成功使用这种方法来风格化场景,首先需要从收集的场景图片中重构出照片真实的辐射场。然而,当仅有的输入视图稀疏时,预训练的少样本NeRFs通常会受到高频噪声的影响,这是由于为了提高重建质量而产生的高频细节。是否可以在稀疏输入下通过直接优化基于场景表示的编码来生成更忠实风格的场景呢?在本文中,我们考虑了通过解开内容语义和风格纹理来对稀疏视图场景进行风格化。我们提出了一个粗-细稀疏视图场景风格化框架,其中一种新颖的层次编码基于神经表示旨在直接生成高质量的风格化场景。我们还提出了一个新的优化策略——内容强度衰减,以实现逼真的风格化和更好的内容保留。大量实验证明,我们的方法可以在稀疏视图场景中实现高品质的风格化,并且在 stylization质量和效率方面优于基于微调的基线。
URL
https://arxiv.org/abs/2404.05236