Abstract
Deep visual Simultaneous Localization and Mapping (SLAM) techniques, e.g., DROID, have made significant advancements by leveraging deep visual odometry on dense flow fields. In general, they heavily rely on global visual similarity matching. However, the ambiguous similarity interference in uncertain regions could often lead to excessive noise in correspondences, ultimately misleading SLAM in geometric modeling. To address this issue, we propose a Learnable Gaussian Uncertainty (LGU) matching. It mainly focuses on precise correspondence construction. In our scheme, a learnable 2D Gaussian uncertainty model is designed to associate matching-frame pairs. It could generate input-dependent Gaussian distributions for each correspondence map. Additionally, a multi-scale deformable correlation sampling strategy is devised to adaptively fine-tune the sampling of each direction by a priori look-up ranges, enabling reliable correlation construction. Furthermore, a KAN-bias GRU component is adopted to improve a temporal iterative enhancement for accomplishing sophisticated spatio-temporal modeling with limited parameters. The extensive experiments on real-world and synthetic datasets are conducted to validate the effectiveness and superiority of our method.
Abstract (translated)
深度视觉同时定位与地图构建(SLAM)技术,例如DROID,通过利用密集流场上的深度视觉里程计取得了显著的进步。总体而言,它们严重依赖于全局视觉相似性匹配。然而,在不确定区域中模糊的相似性干扰可能会导致对应关系中的大量噪声,最终误导几何建模过程中的SLAM。为了解决这个问题,我们提出了一种可学习高斯不确定性(LGU)匹配方法。它主要集中在精确对应关系构建上。在我们的方案中,设计了一个可学习的二维高斯不确定性模型来关联匹配帧对,并能够为每个对应关系图生成输入依赖的高斯分布。此外,还制定了一种多尺度变形相关采样策略,通过先验查找范围自适应地微调每个方向上的采样过程,从而实现可靠的关联构建。另外,采用了KAN偏差GRU组件来改进时间迭代增强,以在有限参数下完成复杂的时空建模。我们进行了大量的真实世界和合成数据集实验,验证了所提方法的有效性和优越性。
URL
https://arxiv.org/abs/2410.23231