Paper Reading AI Learner

Robust tightly-coupled pose estimation based on monocular vision, inertia and wheel speed

2020-03-03 13:17:28
Peng Gang, Lu Zezao, Chen Bocheng, Chen Shanliang, He Dingxin

Abstract

The Visual SLAM method is widely used in self localization and mapping in complex environments. Visual-Inertia SLAM which combines camera with IMU can significantly improve the robustness and make scale week-visible while Monocular Visual SLAM is scale-invisible. For the ground mobile robot, the introduction of the wheel speed sensor can solve the scale weak-visible problem and improve the robustness under abnormal conditions. In this thesis, a multi-sensor fusion SLAM algorithm using monocular vision, inertial and wheel speed measurement is proposed. The sensor measurements are combined in a tightly-coupled manner, and the nonlinear optimization method is used to maximize the posterior probability to solve the optimal state estimation. Loop detection and back-end optimization are added to help reduce or even eliminate the cumulative error of estimated poses, ensuring global consistency of trajectory and map. The main research results include: wheel odometer pre-integration algorithm which combines chassis speed and IMU angular speed can avoid repeated integration caused by linearization point changes during iterative optimization; the state initialization based on the wheel odometer and IMU makes it possible to quickly and reliably calculate the initial state values required by the state estimator in both stationary and moving states. Comparative experiments were carried out in room scale scenes, building scale scenes, and visual loss. The results show that proposed algorithm has high accuracy, 2.2 m of cumulative error after the motion of 812 meters (0.28%, loopback optimization disabled); strong robustness, effectively localization even in the case of sensor loss such as visual loss, etc. Accuracy and robustness of proposed method are superior to Monocular Visual Inertia SLAM and traditional wheel odometer.

Abstract (translated)

URL

https://arxiv.org/abs/2003.01496

PDF

https://arxiv.org/pdf/2003.01496.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot