Transforming Radiance Field with Lipschitz Network for Photorealistic 3D Scene Stylization

Abstract
Abstract (translated)
URL
PDF

Abstract

Recent advances in 3D scene representation and novel view synthesis have witnessed the rise of Neural Radiance Fields (NeRFs). Nevertheless, it is not trivial to exploit NeRF for the photorealistic 3D scene stylization task, which aims to generate visually consistent and photorealistic stylized scenes from novel views. Simply coupling NeRF with photorealistic style transfer (PST) will result in cross-view inconsistency and degradation of stylized view syntheses. Through a thorough analysis, we demonstrate that this non-trivial task can be simplified in a new light: When transforming the appearance representation of a pre-trained NeRF with Lipschitz mapping, the consistency and photorealism across source views will be seamlessly encoded into the syntheses. That motivates us to build a concise and flexible learning framework namely LipRF, which upgrades arbitrary 2D PST methods with Lipschitz mapping tailored for the 3D scene. Technically, LipRF first pre-trains a radiance field to reconstruct the 3D scene, and then emulates the style on each view by 2D PST as the prior to learn a Lipschitz network to stylize the pre-trained appearance. In view of that Lipschitz condition highly impacts the expressivity of the neural network, we devise an adaptive regularization to balance the reconstruction and stylization. A gradual gradient aggregation strategy is further introduced to optimize LipRF in a cost-efficient manner. We conduct extensive experiments to show the high quality and robust performance of LipRF on both photorealistic 3D stylization and object appearance editing.

Abstract (translated)

最近的3D场景表示和新视角合成技术的进步见证了神经网络辐射场(NeRF)的崛起。然而，利用NeRF进行逼真3D场景风格化任务仍然是一项艰巨的任务，该任务旨在从新视角生成视觉一致性和逼真风格化的场景。仅仅将NeRF与逼真风格转移(PST)耦合会导致跨视角一致性和风格化合成的退化。通过深入分析，我们证明了这个艰巨的任务可以通过新的视角简化：当对预先训练的NeRF的外观表示进行Lipschitz映射时，源视角的一致性和逼真性将无缝编码到合成中。这激励我们建立名为LipRF的简洁且灵活的学习框架，该框架升级了任意2D PST方法，并针对3D场景进行了Lipschitz映射定制。技术上，LipRF首先预训练一个辐射场来重建3D场景，然后通过2D PST模拟每个视角，作为之前学习的一个Lipschitz网络，学习一种风格化网络以风格化预先训练的外观。鉴于Lipschitz条件高度影响神经网络表达能力，我们设计了一种自适应正则化来平衡重建和风格化。逐渐梯度聚合策略还引入了以高效优化LipRF。我们进行了广泛的实验，以展示LipRF在逼真3D风格化任务和对象外观编辑中的质量和鲁棒性能。

URL

https://arxiv.org/abs/2303.13232

PDF

https://arxiv.org/pdf/2303.13232.pdf