Paper Reading AI Learner

Multi-Process Fusion: Visual Place Recognition Using Multiple Image Processing Methods

2019-03-08 06:50:47
Stephen Hausler, Adam Jacobson, Michael Milford

Abstract

Typical attempts to improve the capability of visual place recognition techniques include the use of multi-sensor fusion and integration of information over time from image sequences. These approaches can improve performance but have disadvantages including the need for multiple physical sensors and calibration processes, both for multiple sensors and for tuning the image matching sequence length. In this paper we address these shortcomings with a novel "multi-sensor" fusion approach applied to multiple image processing methods for a single visual image stream, combined with a dynamic sequence matching length technique and an automatic processing method weighting scheme. In contrast to conventional single method approaches, our approach reduces the performance requirements of a single image processing methodology, instead requiring that within the suite of image processing methods, at least one performs well in any particular environment. In comparison to static sequence length techniques, the dynamic sequence matching technique enables reduced localization latencies through analysis of recognition quality metrics when re-entering familiar locations. We evaluate our approach on multiple challenging benchmark datasets, achieving superior performance to two state-of-the-art visual place recognition systems across environmental changes including winter to summer, afternoon to morning and night to day. Across the four benchmark datasets our proposed approach achieves an average F1 score of 0.96, compared to 0.78 for NetVLAD and 0.49 for SeqSLAM. We provide source code for the multi-fusion method and present analysis explaining how superior performance is achieved despite the multiple, disparate, image processing methods all being applied to a single source of imagery, rather than to multiple separate sensors.

Abstract (translated)

提高视觉位置识别技术能力的典型尝试包括使用多传感器融合和图像序列随时间的信息集成。这些方法可以提高性能,但也有缺点,包括需要多个物理传感器和校准过程,既适用于多个传感器,也适用于调整图像匹配序列长度。本文采用一种新的“多传感器”融合方法,结合动态序列匹配长度技术和自动处理方法加权方案,对单个视觉图像流的多种图像处理方法进行了研究。与传统的单一方法方法相比,我们的方法降低了单一图像处理方法的性能要求,而不是要求在一套图像处理方法中,至少有一种在任何特定环境中都能很好地执行。与静态序列长度技术相比,动态序列匹配技术在重新进入熟悉位置时,通过分析识别质量指标,可以减少定位延迟。我们在多个具有挑战性的基准数据集上评估我们的方法,在环境变化(包括冬季到夏季、下午到早晨和晚上到白天)中实现比两个最先进的视觉位置识别系统优越的性能。在四个基准数据集中,我们提出的方法平均F1得分为0.96,相比之下,NetVLAD为0.78,Seqslam为0.49。我们提供了多融合方法的源代码,并给出了分析,解释了尽管多个不同的图像处理方法都应用于单个图像源,而不是多个单独的传感器,但如何实现卓越性能。

URL

https://arxiv.org/abs/1903.03305

PDF

https://arxiv.org/pdf/1903.03305.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot