Paper Reading AI Learner

VRS-NeRF: Visual Relocalization with Sparse Neural Radiance Field

2024-04-14 14:26:33
Fei Xue, Ignas Budvytis, Daniel Olmeda Reino, Roberto Cipolla

Abstract

Visual relocalization is a key technique to autonomous driving, robotics, and virtual/augmented reality. After decades of explorations, absolute pose regression (APR), scene coordinate regression (SCR), and hierarchical methods (HMs) have become the most popular frameworks. However, in spite of high efficiency, APRs and SCRs have limited accuracy especially in large-scale outdoor scenes; HMs are accurate but need to store a large number of 2D descriptors for matching, resulting in poor efficiency. In this paper, we propose an efficient and accurate framework, called VRS-NeRF, for visual relocalization with sparse neural radiance field. Precisely, we introduce an explicit geometric map (EGM) for 3D map representation and an implicit learning map (ILM) for sparse patches rendering. In this localization process, EGP provides priors of spare 2D points and ILM utilizes these sparse points to render patches with sparse NeRFs for matching. This allows us to discard a large number of 2D descriptors so as to reduce the map size. Moreover, rendering patches only for useful points rather than all pixels in the whole image reduces the rendering time significantly. This framework inherits the accuracy of HMs and discards their low efficiency. Experiments on 7Scenes, CambridgeLandmarks, and Aachen datasets show that our method gives much better accuracy than APRs and SCRs, and close performance to HMs but is much more efficient.

Abstract (translated)

视觉重定位是自动驾驶、机器人学和虚拟/增强现实中的关键技术。经过几十年的探索,绝对姿态回归(APR)、场景坐标回归(SCR)和分层方法(HMs)已成为最受欢迎的框架。然而,尽管它们具有高效率,但APR和SCR在大规模室外场景中的准确性有限;分层方法准确,但需要存储大量二维描述符进行匹配,导致效率低下。在本文中,我们提出了一个高效且准确的框架,称为VRS-NeRF,用于视觉重定位稀疏神经辐射场。具体来说,我们引入了一个 explicit geometric map(EGM)用于3D地图表示和一个 implicit learning map(ILM)用于稀疏补丁渲染。在定位过程中,EGP 提供稀疏2D点的先验,ILM利用这些稀疏点进行稀疏NeRF的补丁渲染。这使得我们能够丢弃大量二维描述符,以减小地图大小。此外,仅对有用点进行补丁渲染,而不是整个图像的像素,显著减少了渲染时间。该框架继承了HMs的准确性,同时也摒弃了它们的低效率。在7Scenes、CambridgeLandmarks和Aachen数据集上的实验结果表明,我们的方法比APR和SCR具有更高的准确度,与HMs的性能接近,但效率更高。

URL

https://arxiv.org/abs/2404.09271

PDF

https://arxiv.org/pdf/2404.09271.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot