Abstract
Neural Radiance Field (NeRF) is a representation for 3D reconstruction from multi-view images. Despite some recent work showing preliminary success in editing a reconstructed NeRF with diffusion prior, they remain struggling to synthesize reasonable geometry in completely uncovered regions. One major reason is the high diversity of synthetic contents from the diffusion model, which hinders the radiance field from converging to a crisp and deterministic geometry. Moreover, applying latent diffusion models on real data often yields a textural shift incoherent to the image condition due to auto-encoding errors. These two problems are further reinforced with the use of pixel-distance losses. To address these issues, we propose tempering the diffusion model's stochasticity with per-scene customization and mitigating the textural shift with masked adversarial training. During the analyses, we also found the commonly used pixel and perceptual losses are harmful in the NeRF inpainting task. Through rigorous experiments, our framework yields state-of-the-art NeRF inpainting results on various real-world scenes. Project page: this https URL
Abstract (translated)
Neural Radiance Field(NeRF)是从多视角图像的三维重建表示。尽管一些最近的工作在编辑重新构建的NeRF时表明初步成功,但它们仍然难以在完全未覆盖的区域中合成合理的几何形状。一个主要原因是从扩散模型中合成内容的多样性,这阻碍了辐射场收敛到清晰和确定性几何。此外,在应用拉文迪格模型的真实数据时,由于自编码错误,通常会导致图像条件下的纹理平滑度转移。这些问题进一步得到像素距离损失的加剧。为了解决这些问题,我们通过每个场景的定制来调节扩散模型的随机性,并通过掩码对抗训练来减轻纹理平滑。在分析过程中,我们还发现,在NeRF修复任务中,通常使用的像素和感知损失是有害的。通过严谨的实验,我们的框架在各种真实世界场景中产生了最先进的NeRF修复结果。项目页面:https:// this URL
URL
https://arxiv.org/abs/2404.09995