Abstract
This paper focuses on training a robust RGB-D registration model without ground-truth pose supervision. Existing methods usually adopt a pairwise training strategy based on differentiable rendering, which enforces the photometric and the geometric consistency between the two registered frames as supervision. However, this frame-to-frame framework suffers from poor multi-view consistency due to factors such as lighting changes, geometry occlusion and reflective materials. In this paper, we present NeRF-UR, a novel frame-to-model optimization framework for unsupervised RGB-D registration. Instead of frame-to-frame consistency, we leverage the neural radiance field (NeRF) as a global model of the scene and use the consistency between the input and the NeRF-rerendered frames for pose optimization. This design can significantly improve the robustness in scenarios with poor multi-view consistency and provides better learning signal for the registration model. Furthermore, to bootstrap the NeRF optimization, we create a synthetic dataset, Sim-RGBD, through a photo-realistic simulator to warm up the registration model. By first training the registration model on Sim-RGBD and later unsupervisedly fine-tuning on real data, our framework enables distilling the capability of feature extraction and registration from simulation to reality. Our method outperforms the state-of-the-art counterparts on two popular indoor RGB-D datasets, ScanNet and 3DMatch. Code and models will be released for paper reproduction.
Abstract (translated)
本文专注于在没有地面姿态监督的情况下训练鲁棒且支持多视图一致性的RGB-D配准模型。现有的方法通常采用基于可导渲染的成对训练策略,以强制两帧之间保持光学一致性和几何一致性作为监督。然而,这种框架由于诸如光照变化、几何遮挡和反光材料等因素,导致多视图一致性较差。在本文中,我们提出了NeRF-UR,一种新的基于帧到模型的无监督RGB-D配准优化框架。我们不再关注帧到帧的一致性,而是利用神经辐射场(NeRF)作为场景的全局模型,并使用输入和NeRF重新渲染的帧之间的一致性来进行姿态优化。这种设计可以在具有较差多视图一致性的场景中显著提高鲁棒性,并为配准模型提供更好的学习信号。此外,为了激发NeRF优化,我们通过照片实感模拟创建了仿真的数据集Sim-RGBD,并通过先在Sim-RGBD上训练注册模型,然后在真实数据上进行无监督微调,使我们的框架将模拟能力从仿真传递到现实。在两个流行的室内RGB-D数据集ScanNet和3DMatch上,我们的方法超越了最先进的同类方法。代码和模型将公开发布,以供论文复制品使用。
URL
https://arxiv.org/abs/2405.00507