Paper Reading AI Learner

A View-consistent Sampling Method for Regularized Training of Neural Radiance Fields

2025-07-06 14:48:10
Aoxiang Fan, Corentin Dumery, Nicolas Talabot, Pascal Fua

Abstract

Neural Radiance Fields (NeRF) has emerged as a compelling framework for scene representation and 3D recovery. To improve its performance on real-world data, depth regularizations have proven to be the most effective ones. However, depth estimation models not only require expensive 3D supervision in training, but also suffer from generalization issues. As a result, the depth estimations can be erroneous in practice, especially for outdoor unbounded scenes. In this paper, we propose to employ view-consistent distributions instead of fixed depth value estimations to regularize NeRF training. Specifically, the distribution is computed by utilizing both low-level color features and high-level distilled features from foundation models at the projected 2D pixel-locations from per-ray sampled 3D points. By sampling from the view-consistency distributions, an implicit regularization is imposed on the training of NeRF. We also utilize a depth-pushing loss that works in conjunction with the sampling technique to jointly provide effective regularizations for eliminating the failure modes. Extensive experiments conducted on various scenes from public datasets demonstrate that our proposed method can generate significantly better novel view synthesis results than state-of-the-art NeRF variants as well as different depth regularization methods.

Abstract (translated)

神经辐射场(Neural Radiance Fields,简称NeRF)已经成为场景表示和3D恢复领域的一个引人注目的框架。为了提高其在真实数据上的性能,深度正则化被证明是最有效的手段之一。然而,深度估计模型不仅需要昂贵的三维监督进行训练,还存在泛化问题。因此,在实践中,尤其是在户外无界场景中,这些深度估计可能会出现错误。 为此,本文提出了一种替代方法:利用视图一致分布而非固定深度值来正则化NeRF的训练过程。具体而言,该分布是通过结合从投影到二维像素位置上的每个射线采样三维点提取的低级颜色特征和高级基础模型精炼特征计算得出的。通过对这些视图一致性分布进行抽样,对NeRF的训练施加了隐式正则化。此外,我们还采用了一种与抽样技术协同工作的深度推动损失,共同提供有效的正则化以消除失败模式。 在来自公开数据集的各种场景上进行的广泛实验表明,所提出的方法可以生成比最先进的NeRF变体以及其他不同的深度正则化方法显著更好的新颖视图合成结果。

URL

https://arxiv.org/abs/2507.04408

PDF

https://arxiv.org/pdf/2507.04408.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot