Paper Reading AI Learner

SLAIM: Robust Dense Neural SLAM for Online Tracking and Mapping

2024-04-17 14:23:28
Vincent Cartillier, Grant Schindler, Irfan Essa

Abstract

We present SLAIM - Simultaneous Localization and Implicit Mapping. We propose a novel coarse-to-fine tracking model tailored for Neural Radiance Field SLAM (NeRF-SLAM) to achieve state-of-the-art tracking performance. Notably, existing NeRF-SLAM systems consistently exhibit inferior tracking performance compared to traditional SLAM algorithms. NeRF-SLAM methods solve camera tracking via image alignment and photometric bundle-adjustment. Such optimization processes are difficult to optimize due to the narrow basin of attraction of the optimization loss in image space (local minima) and the lack of initial correspondences. We mitigate these limitations by implementing a Gaussian pyramid filter on top of NeRF, facilitating a coarse-to-fine tracking optimization strategy. Furthermore, NeRF systems encounter challenges in converging to the right geometry with limited input views. While prior approaches use a Signed-Distance Function (SDF)-based NeRF and directly supervise SDF values by approximating ground truth SDF through depth measurements, this often results in suboptimal geometry. In contrast, our method employs a volume density representation and introduces a novel KL regularizer on the ray termination distribution, constraining scene geometry to consist of empty space and opaque surfaces. Our solution implements both local and global bundle-adjustment to produce a robust (coarse-to-fine) and accurate (KL regularizer) SLAM solution. We conduct experiments on multiple datasets (ScanNet, TUM, Replica) showing state-of-the-art results in tracking and in reconstruction accuracy.

Abstract (translated)

我们提出了SLAIM - 同时定位和隐式映射。我们针对Neural Radiance Field SLAM(NeRF-SLAM)提出了一种新颖的粗-到细跟踪模型,以实现最先进的跟踪性能。值得注意的是,现有的NeRF-SLAM系统与传统SLAM算法相比,跟踪性能 consistently较差。通过图像对齐和光度 bundle-adjustment,NeRF-SLAM方法通过优化损失函数在图像空间中的狭窄吸引域(局部最小值)来解决相机跟踪问题。由于优化损失函数在图像空间中的狭窄吸引域和缺乏初始对应关系,这种优化过程很难优化。通过在NeRF上实现高斯金字塔滤波器,我们通过促进粗-到细跟踪优化策略来缓解这些限制。此外,NeRF系统在有限输入视图下难以收敛到正确的几何形状。虽然之前的 approaches 使用基于 Signed-Distance Function (SDF) 的NeRF并直接通过深度测量通过近似地面真实 SDF 来指导SDF值,但通常会导致次优的几何形状。相比之下,我们的方法采用体积密度表示,并在光线终止分布上引入了一种新颖的KL正则化器,将场景几何限制为空旷空间和透明表面。我们的解决方案同时实现局部和全局束调整,产生一个稳健(粗-到细)和准确的SLAM解决方案。我们在多个数据集(ScanNet,TUM,Replica)上进行实验,展示了在跟踪和重建精度方面的最先进结果。

URL

https://arxiv.org/abs/2404.11419

PDF

https://arxiv.org/pdf/2404.11419.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot