Paper Reading AI Learner

DBA-Fusion: Tightly Integrating Deep Dense Visual Bundle Adjustment with Multiple Sensors for Large-Scale Localization and Mapping

2024-03-20 16:20:54
Yuxuan Zhou, Xingxing Li, Shengyu Li, Xuanbin Wang, Shaoquan Feng, Yuxuan Tan

Abstract

Visual simultaneous localization and mapping (VSLAM) has broad applications, with state-of-the-art methods leveraging deep neural networks for better robustness and applicability. However, there is a lack of research in fusing these learning-based methods with multi-sensor information, which could be indispensable to push related applications to large-scale and complex scenarios. In this paper, we tightly integrate the trainable deep dense bundle adjustment (DBA) with multi-sensor information through a factor graph. In the framework, recurrent optical flow and DBA are performed among sequential images. The Hessian information derived from DBA is fed into a generic factor graph for multi-sensor fusion, which employs a sliding window and supports probabilistic marginalization. A pipeline for visual-inertial integration is firstly developed, which provides the minimum ability of metric-scale localization and mapping. Furthermore, other sensors (e.g., global navigation satellite system) are integrated for driftless and geo-referencing functionality. Extensive tests are conducted on both public datasets and self-collected datasets. The results validate the superior localization performance of our approach, which enables real-time dense mapping in large-scale environments. The code has been made open-source (this https URL).

Abstract (translated)

视觉同时定位和映射(VSLAM)具有广泛的应用,最先进的方法利用深度神经网络的优点来提高其稳健性和适用性。然而,将这些基于学习的方法与多传感器信息相结合的研究还很少,这对于推动相关应用向大规模和复杂场景实现至关重要。在本文中,我们将通过因子图将可训练的深度密集卷积 bundle adjustment(DBA)与多传感器信息相结合。在框架中,连续光流和 DBA 在序列图像之间执行。从 DBA 获得的 Hessian 信息被输入到通用因子图进行多传感器融合,该框架采用滑动窗口并支持概率边际。首先开发了视觉-惯性整合的流程,提供了最小的大规模局部定位和映射能力。此外,还集成了其他传感器(例如全球导航卫星系统)以实现无漂移和地理参考功能。在公开数据集和自收集数据集上进行了广泛的测试。测试结果证实了我们的方法在大型环境中的卓越定位性能,从而实现了在大型环境中的实时密集映射。该代码已公开开源(此 https URL)。

URL

https://arxiv.org/abs/2403.13714

PDF

https://arxiv.org/pdf/2403.13714.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot