Paper Reading AI Learner

MonoPatchNeRF: Improving Neural Radiance Fields with Patch-based Monocular Guidance

2024-04-12 05:43:10
Yuqun Wu, Jae Yong Lee, Chuhang Zou, Shenlong Wang, Derek Hoiem

Abstract

The latest regularized Neural Radiance Field (NeRF) approaches produce poor geometry and view extrapolation for multiview stereo (MVS) benchmarks such as ETH3D. In this paper, we aim to create 3D models that provide accurate geometry and view synthesis, partially closing the large geometric performance gap between NeRF and traditional MVS methods. We propose a patch-based approach that effectively leverages monocular surface normal and relative depth predictions. The patch-based ray sampling also enables the appearance regularization of normalized cross-correlation (NCC) and structural similarity (SSIM) between randomly sampled virtual and training views. We further show that "density restrictions" based on sparse structure-from-motion points can help greatly improve geometric accuracy with a slight drop in novel view synthesis metrics. Our experiments show 4x the performance of RegNeRF and 8x that of FreeNeRF on average F1@2cm for ETH3D MVS benchmark, suggesting a fruitful research direction to improve the geometric accuracy of NeRF-based models, and sheds light on a potential future approach to enable NeRF-based optimization to eventually outperform traditional MVS.

Abstract (translated)

最新的正规化神经辐射场(NeRF)方法在多视角立体(MVS)基准测试中产生几何和视图扩展效果较差。在本文中,我们的目标是创建3D模型,提供准确的几何和视图合成,在一定程度上缩小NeRF和传统MVS方法之间的巨大几何性能差距。我们提出了基于补丁的方法,有效利用单目表面法线和相对深度预测。基于补丁的 ray sampling 还使得随机采样虚拟和训练视图之间的标准化交叉象限(NCC)和结构相似性(SSIM)的显现 regularization。我们进一步证明了基于稀疏结构运动点 的 "密度限制" 可以极大地改善几何准确性,而稍有降低新颖视图合成指标。我们的实验结果表明,RegNeRF和FreeNeRF在ETH3D MVS基准上的平均F1@2cm分别是4倍和8倍。这表明有前途的研究方向是提高基于NeRF的模型的几何准确性,并阐明一种最终使基于NeRF的优化能够超越传统MVS方法的潜在方法。

URL

https://arxiv.org/abs/2404.08252

PDF

https://arxiv.org/pdf/2404.08252.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot