Abstract
We present a real-time stereo visual-inertial-SLAM system which is able to recover from complicatedkidnap scenarios and failures online in realtime. We propose to learn the whole-image-descriptorin a weakly supervised manner based on NetVLAD and decoupled convolutions. We analyse thetraining difficulties in using standard loss formulations and propose an allpairloss and show itseffect through extensive experiments. Compared to standard NetVLAD, our network takes an orderof magnitude fewer computations and model parameters, as a result runs about three times faster.We evaluate the representation power of our descriptor on standard datasets with precision-recall.Unlike previous loop detection methods which have been evaluated only on fronto-parallel revisits,we evaluate the performace of our method with competing methods on scenarios involving largeviewpoint difference. Finally, we present the fully functional system with relative computation andhandling of multiple world co-ordinate system which is able to reduce odometry drift, recover fromcomplicated kidnap scenarios and random odometry failures. We open source our fully functional system as an add-on for the popular VINS-Fusion.
Abstract (translated)
我们提出了一种实时立体视觉惯性冲击系统,该系统能够从复杂的绑架场景和在线故障中实时恢复。我们提出了一种基于NetVLAD和去耦卷积的弱监督方式来学习整个图像描述符。分析了标准损失公式在使用过程中的训练难点,提出了一种全过程损失公式,并通过大量实验证明了其效果。与标准netvlad相比,我们的网络计算量和模型参数减少了一个数量级,因此运行速度快了大约三倍。我们用精确召回评估了标准数据集上描述符的表示能力。与以前仅在前向并行重访中评估的循环检测方法不同,我们的网络使用的是在涉及大视点差异的情况下,评估我们的方法与竞争方法的性能。最后,我们提出了一个功能完备的系统,该系统具有多个世界坐标系统的相对计算和处理能力,能够减少里程计的漂移,从复杂的绑架场景和随机里程计故障中恢复。我们将我们的全功能系统作为流行VIN融合的附加组件进行开源。
URL
https://arxiv.org/abs/1904.06962