Abstract
Visual relocalization is a key technique to autonomous driving, robotics, and virtual/augmented reality. After decades of explorations, absolute pose regression (APR), scene coordinate regression (SCR), and hierarchical methods (HMs) have become the most popular frameworks. However, in spite of high efficiency, APRs and SCRs have limited accuracy especially in large-scale outdoor scenes; HMs are accurate but need to store a large number of 2D descriptors for matching, resulting in poor efficiency. In this paper, we propose an efficient and accurate framework, called VRS-NeRF, for visual relocalization with sparse neural radiance field. Precisely, we introduce an explicit geometric map (EGM) for 3D map representation and an implicit learning map (ILM) for sparse patches rendering. In this localization process, EGP provides priors of spare 2D points and ILM utilizes these sparse points to render patches with sparse NeRFs for matching. This allows us to discard a large number of 2D descriptors so as to reduce the map size. Moreover, rendering patches only for useful points rather than all pixels in the whole image reduces the rendering time significantly. This framework inherits the accuracy of HMs and discards their low efficiency. Experiments on 7Scenes, CambridgeLandmarks, and Aachen datasets show that our method gives much better accuracy than APRs and SCRs, and close performance to HMs but is much more efficient.
Abstract (translated)
视觉重定位是自动驾驶、机器人学和虚拟/增强现实中的关键技术。经过几十年的探索,绝对姿态回归(APR)、场景坐标回归(SCR)和分层方法(HMs)已成为最受欢迎的框架。然而,尽管它们具有高效率,但APR和SCR在大规模室外场景中的准确性有限;分层方法准确,但需要存储大量二维描述符进行匹配,导致效率低下。在本文中,我们提出了一个高效且准确的框架,称为VRS-NeRF,用于视觉重定位稀疏神经辐射场。具体来说,我们引入了一个 explicit geometric map(EGM)用于3D地图表示和一个 implicit learning map(ILM)用于稀疏补丁渲染。在定位过程中,EGP 提供稀疏2D点的先验,ILM利用这些稀疏点进行稀疏NeRF的补丁渲染。这使得我们能够丢弃大量二维描述符,以减小地图大小。此外,仅对有用点进行补丁渲染,而不是整个图像的像素,显著减少了渲染时间。该框架继承了HMs的准确性,同时也摒弃了它们的低效率。在7Scenes、CambridgeLandmarks和Aachen数据集上的实验结果表明,我们的方法比APR和SCR具有更高的准确度,与HMs的性能接近,但效率更高。
URL
https://arxiv.org/abs/2404.09271