Paper Reading AI Learner

LGU-SLAM: Learnable Gaussian Uncertainty Matching with Deformable Correlation Sampling for Deep Visual SLAM

2024-10-30 17:20:08
Yucheng Huang, Luping Ji, Hudong Liu, Mao Ye

Abstract

Deep visual Simultaneous Localization and Mapping (SLAM) techniques, e.g., DROID, have made significant advancements by leveraging deep visual odometry on dense flow fields. In general, they heavily rely on global visual similarity matching. However, the ambiguous similarity interference in uncertain regions could often lead to excessive noise in correspondences, ultimately misleading SLAM in geometric modeling. To address this issue, we propose a Learnable Gaussian Uncertainty (LGU) matching. It mainly focuses on precise correspondence construction. In our scheme, a learnable 2D Gaussian uncertainty model is designed to associate matching-frame pairs. It could generate input-dependent Gaussian distributions for each correspondence map. Additionally, a multi-scale deformable correlation sampling strategy is devised to adaptively fine-tune the sampling of each direction by a priori look-up ranges, enabling reliable correlation construction. Furthermore, a KAN-bias GRU component is adopted to improve a temporal iterative enhancement for accomplishing sophisticated spatio-temporal modeling with limited parameters. The extensive experiments on real-world and synthetic datasets are conducted to validate the effectiveness and superiority of our method.

Abstract (translated)

深度视觉同时定位与地图构建(SLAM)技术,例如DROID,通过利用密集流场上的深度视觉里程计取得了显著的进步。总体而言,它们严重依赖于全局视觉相似性匹配。然而,在不确定区域中模糊的相似性干扰可能会导致对应关系中的大量噪声,最终误导几何建模过程中的SLAM。为了解决这个问题,我们提出了一种可学习高斯不确定性(LGU)匹配方法。它主要集中在精确对应关系构建上。在我们的方案中,设计了一个可学习的二维高斯不确定性模型来关联匹配帧对,并能够为每个对应关系图生成输入依赖的高斯分布。此外,还制定了一种多尺度变形相关采样策略,通过先验查找范围自适应地微调每个方向上的采样过程,从而实现可靠的关联构建。另外,采用了KAN偏差GRU组件来改进时间迭代增强,以在有限参数下完成复杂的时空建模。我们进行了大量的真实世界和合成数据集实验,验证了所提方法的有效性和优越性。

URL

https://arxiv.org/abs/2410.23231

PDF

https://arxiv.org/pdf/2410.23231.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot