Paper Reading AI Learner

NeRF-Guided Unsupervised Learning of RGB-D Registration

2024-05-01 13:38:03
Zhinan Yu, Zheng Qin, Yijie Tang, Yongjun Wang, Renjiao Yi, Chenyang Zhu, Kai Xu

Abstract

This paper focuses on training a robust RGB-D registration model without ground-truth pose supervision. Existing methods usually adopt a pairwise training strategy based on differentiable rendering, which enforces the photometric and the geometric consistency between the two registered frames as supervision. However, this frame-to-frame framework suffers from poor multi-view consistency due to factors such as lighting changes, geometry occlusion and reflective materials. In this paper, we present NeRF-UR, a novel frame-to-model optimization framework for unsupervised RGB-D registration. Instead of frame-to-frame consistency, we leverage the neural radiance field (NeRF) as a global model of the scene and use the consistency between the input and the NeRF-rerendered frames for pose optimization. This design can significantly improve the robustness in scenarios with poor multi-view consistency and provides better learning signal for the registration model. Furthermore, to bootstrap the NeRF optimization, we create a synthetic dataset, Sim-RGBD, through a photo-realistic simulator to warm up the registration model. By first training the registration model on Sim-RGBD and later unsupervisedly fine-tuning on real data, our framework enables distilling the capability of feature extraction and registration from simulation to reality. Our method outperforms the state-of-the-art counterparts on two popular indoor RGB-D datasets, ScanNet and 3DMatch. Code and models will be released for paper reproduction.

Abstract (translated)

本文专注于在没有地面姿态监督的情况下训练鲁棒且支持多视图一致性的RGB-D配准模型。现有的方法通常采用基于可导渲染的成对训练策略,以强制两帧之间保持光学一致性和几何一致性作为监督。然而,这种框架由于诸如光照变化、几何遮挡和反光材料等因素,导致多视图一致性较差。在本文中,我们提出了NeRF-UR,一种新的基于帧到模型的无监督RGB-D配准优化框架。我们不再关注帧到帧的一致性,而是利用神经辐射场(NeRF)作为场景的全局模型,并使用输入和NeRF重新渲染的帧之间的一致性来进行姿态优化。这种设计可以在具有较差多视图一致性的场景中显著提高鲁棒性,并为配准模型提供更好的学习信号。此外,为了激发NeRF优化,我们通过照片实感模拟创建了仿真的数据集Sim-RGBD,并通过先在Sim-RGBD上训练注册模型,然后在真实数据上进行无监督微调,使我们的框架将模拟能力从仿真传递到现实。在两个流行的室内RGB-D数据集ScanNet和3DMatch上,我们的方法超越了最先进的同类方法。代码和模型将公开发布,以供论文复制品使用。

URL

https://arxiv.org/abs/2405.00507

PDF

https://arxiv.org/pdf/2405.00507.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot