Abstract
Perceptual aliasing and weak textures pose significant challenges to the task of place recognition, hindering the performance of Simultaneous Localization and Mapping (SLAM) systems. This paper presents a novel model, called UMF (standing for Unifying Local and Global Multimodal Features) that 1) leverages multi-modality by cross-attention blocks between vision and LiDAR features, and 2) includes a re-ranking stage that re-orders based on local feature matching the top-k candidates retrieved using a global representation. Our experiments, particularly on sequences captured on a planetary-analogous environment, show that UMF outperforms significantly previous baselines in those challenging aliased environments. Since our work aims to enhance the reliability of SLAM in all situations, we also explore its performance on the widely used RobotCar dataset, for broader applicability. Code and models are available at this https URL
Abstract (translated)
感知伪迹和弱纹理对空间识别任务构成了重大挑战,阻碍了同时定位与映射(SLAM)系统的性能。本文介绍了一种名为UMF(意为统一多模态特征)的新模型,该模型通过将视觉和激光雷达特征之间的跨注意力和重新排名阶段相结合,从而实现了1)利用多模态通过跨注意力和2)根据局部特征匹配 top-k 候选项的全局表示,重新排序。我们对行星模拟环境中的序列进行的实验表明,UMF在这些具有伪迹挑战性的环境中显著优于之前的基准模型。由于我们的工作旨在增强SLAM在所有情况下的可靠性,我们还研究了SLAM在广泛使用的机器人汽车数据集上的性能,以更广泛的适用性。代码和模型可通过此链接获得:
URL
https://arxiv.org/abs/2403.13395