Paper Reading AI Learner

Learning Whole-Image Descriptors for Real-time Loop Detection andKidnap Recovery under Large Viewpoint Difference

2019-04-15 11:01:04
Manohar Kuse, Shaojie Shen

Abstract

We present a real-time stereo visual-inertial-SLAM system which is able to recover from complicatedkidnap scenarios and failures online in realtime. We propose to learn the whole-image-descriptorin a weakly supervised manner based on NetVLAD and decoupled convolutions. We analyse thetraining difficulties in using standard loss formulations and propose an allpairloss and show itseffect through extensive experiments. Compared to standard NetVLAD, our network takes an orderof magnitude fewer computations and model parameters, as a result runs about three times faster.We evaluate the representation power of our descriptor on standard datasets with precision-recall.Unlike previous loop detection methods which have been evaluated only on fronto-parallel revisits,we evaluate the performace of our method with competing methods on scenarios involving largeviewpoint difference. Finally, we present the fully functional system with relative computation andhandling of multiple world co-ordinate system which is able to reduce odometry drift, recover fromcomplicated kidnap scenarios and random odometry failures. We open source our fully functional system as an add-on for the popular VINS-Fusion.

Abstract (translated)

我们提出了一种实时立体视觉惯性冲击系统,该系统能够从复杂的绑架场景和在线故障中实时恢复。我们提出了一种基于NetVLAD和去耦卷积的弱监督方式来学习整个图像描述符。分析了标准损失公式在使用过程中的训练难点,提出了一种全过程损失公式,并通过大量实验证明了其效果。与标准netvlad相比,我们的网络计算量和模型参数减少了一个数量级,因此运行速度快了大约三倍。我们用精确召回评估了标准数据集上描述符的表示能力。与以前仅在前向并行重访中评估的循环检测方法不同,我们的网络使用的是在涉及大视点差异的情况下,评估我们的方法与竞争方法的性能。最后,我们提出了一个功能完备的系统,该系统具有多个世界坐标系统的相对计算和处理能力,能够减少里程计的漂移,从复杂的绑架场景和随机里程计故障中恢复。我们将我们的全功能系统作为流行VIN融合的附加组件进行开源。

URL

https://arxiv.org/abs/1904.06962

PDF

https://arxiv.org/pdf/1904.06962.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot