Abstract
Typical attempts to improve the capability of visual place recognition techniques include the use of multi-sensor fusion and integration of information over time from image sequences. These approaches can improve performance but have disadvantages including the need for multiple physical sensors and calibration processes, both for multiple sensors and for tuning the image matching sequence length. In this paper we address these shortcomings with a novel "multi-sensor" fusion approach applied to multiple image processing methods for a single visual image stream, combined with a dynamic sequence matching length technique and an automatic processing method weighting scheme. In contrast to conventional single method approaches, our approach reduces the performance requirements of a single image processing methodology, instead requiring that within the suite of image processing methods, at least one performs well in any particular environment. In comparison to static sequence length techniques, the dynamic sequence matching technique enables reduced localization latencies through analysis of recognition quality metrics when re-entering familiar locations. We evaluate our approach on multiple challenging benchmark datasets, achieving superior performance to two state-of-the-art visual place recognition systems across environmental changes including winter to summer, afternoon to morning and night to day. Across the four benchmark datasets our proposed approach achieves an average F1 score of 0.96, compared to 0.78 for NetVLAD and 0.49 for SeqSLAM. We provide source code for the multi-fusion method and present analysis explaining how superior performance is achieved despite the multiple, disparate, image processing methods all being applied to a single source of imagery, rather than to multiple separate sensors.
Abstract (translated)
提高视觉位置识别技术能力的典型尝试包括使用多传感器融合和图像序列随时间的信息集成。这些方法可以提高性能,但也有缺点,包括需要多个物理传感器和校准过程,既适用于多个传感器,也适用于调整图像匹配序列长度。本文采用一种新的“多传感器”融合方法,结合动态序列匹配长度技术和自动处理方法加权方案,对单个视觉图像流的多种图像处理方法进行了研究。与传统的单一方法方法相比,我们的方法降低了单一图像处理方法的性能要求,而不是要求在一套图像处理方法中,至少有一种在任何特定环境中都能很好地执行。与静态序列长度技术相比,动态序列匹配技术在重新进入熟悉位置时,通过分析识别质量指标,可以减少定位延迟。我们在多个具有挑战性的基准数据集上评估我们的方法,在环境变化(包括冬季到夏季、下午到早晨和晚上到白天)中实现比两个最先进的视觉位置识别系统优越的性能。在四个基准数据集中,我们提出的方法平均F1得分为0.96,相比之下,NetVLAD为0.78,Seqslam为0.49。我们提供了多融合方法的源代码,并给出了分析,解释了尽管多个不同的图像处理方法都应用于单个图像源,而不是多个单独的传感器,但如何实现卓越性能。
URL
https://arxiv.org/abs/1903.03305