Abstract
Visual simultaneous localization and mapping (VSLAM) has broad applications, with state-of-the-art methods leveraging deep neural networks for better robustness and applicability. However, there is a lack of research in fusing these learning-based methods with multi-sensor information, which could be indispensable to push related applications to large-scale and complex scenarios. In this paper, we tightly integrate the trainable deep dense bundle adjustment (DBA) with multi-sensor information through a factor graph. In the framework, recurrent optical flow and DBA are performed among sequential images. The Hessian information derived from DBA is fed into a generic factor graph for multi-sensor fusion, which employs a sliding window and supports probabilistic marginalization. A pipeline for visual-inertial integration is firstly developed, which provides the minimum ability of metric-scale localization and mapping. Furthermore, other sensors (e.g., global navigation satellite system) are integrated for driftless and geo-referencing functionality. Extensive tests are conducted on both public datasets and self-collected datasets. The results validate the superior localization performance of our approach, which enables real-time dense mapping in large-scale environments. The code has been made open-source (this https URL).
Abstract (translated)
视觉同时定位和映射(VSLAM)具有广泛的应用,最先进的方法利用深度神经网络的优点来提高其稳健性和适用性。然而,将这些基于学习的方法与多传感器信息相结合的研究还很少,这对于推动相关应用向大规模和复杂场景实现至关重要。在本文中,我们将通过因子图将可训练的深度密集卷积 bundle adjustment(DBA)与多传感器信息相结合。在框架中,连续光流和 DBA 在序列图像之间执行。从 DBA 获得的 Hessian 信息被输入到通用因子图进行多传感器融合,该框架采用滑动窗口并支持概率边际。首先开发了视觉-惯性整合的流程,提供了最小的大规模局部定位和映射能力。此外,还集成了其他传感器(例如全球导航卫星系统)以实现无漂移和地理参考功能。在公开数据集和自收集数据集上进行了广泛的测试。测试结果证实了我们的方法在大型环境中的卓越定位性能,从而实现了在大型环境中的实时密集映射。该代码已公开开源(此 https URL)。
URL
https://arxiv.org/abs/2403.13714