Paper Reading AI Learner

MUTE-SLAM: Real-Time Neural SLAM with Multiple Tri-Plane Hash Representations

2024-03-26 14:53:24
Yifan Yan, Ruomin He, Zhenghua Liu

Abstract

We introduce MUTE-SLAM, a real-time neural RGB-D SLAM system employing multiple tri-plane hash-encodings for efficient scene representation. MUTE-SLAM effectively tracks camera positions and incrementally builds a scalable multi-map representation for both small and large indoor environments. It dynamically allocates sub-maps for newly observed local regions, enabling constraint-free mapping without prior scene information. Unlike traditional grid-based methods, we use three orthogonal axis-aligned planes for hash-encoding scene properties, significantly reducing hash collisions and the number of trainable parameters. This hybrid approach not only speeds up convergence but also enhances the fidelity of surface reconstruction. Furthermore, our optimization strategy concurrently optimizes all sub-maps intersecting with the current camera frustum, ensuring global consistency. Extensive testing on both real-world and synthetic datasets has shown that MUTE-SLAM delivers state-of-the-art surface reconstruction quality and competitive tracking performance across diverse indoor settings. The code will be made public upon acceptance of the paper.

Abstract (translated)

我们提出了MUTE-SLAM,一种采用多个三平面哈希编码实现实时神经实时SLAM系统,用于对场景进行高效表示。MUTE-SLAM有效地跟踪相机位置,并逐步构建了可扩展的多地图表示,无论是小还是大的室内环境。它动态地分配给新观察到的局部区域的子图,从而在不需要先验场景信息的情况下实现无约束的映射。与传统网格 based 方法不同,我们使用三个正交的轴向对齐平面进行哈希编码,显著减少了哈希冲突和训练参数的数量。这种混合方法不仅加速了收敛,还提高了表面复原的准确度。此外,我们的优化策略同时优化所有与当前相机弗鲁斯面相交的子图,确保全局一致性。在真实世界和合成数据集的广泛测试中,MUTE-SLAM证明了其在各种室内环境中具有最先进的表面复原质量和竞争力的跟踪性能。代码将在论文接受后公开。

URL

https://arxiv.org/abs/2403.17765

PDF

https://arxiv.org/pdf/2403.17765.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot