Paper Reading AI Learner

Learning by Inertia: Self-supervised Monocular Visual Odometry for Road Vehicles

2019-05-05 08:58:35
Chengze Wang, Yuan Yuan, Qi Wang

Abstract

In this paper, we present iDVO (inertia-embedded deep visual odometry), a self-supervised learning based monocular visual odometry (VO) for road vehicles. When modelling the geometric consistency within adjacent frames, most deep VO methods ignore the temporal continuity of the camera pose, which results in a very severe jagged fluctuation in the velocity curves. With the observation that road vehicles tend to perform smooth dynamic characteristics in most of the time, we design the inertia loss function to describe the abnormal motion variation, which assists the model to learn the consecutiveness from long-term camera ego-motion. Based on the recurrent convolutional neural network (RCNN) architecture, our method implicitly models the dynamics of road vehicles and the temporal consecutiveness by the extended Long Short-Term Memory (LSTM) block. Furthermore, we develop the dynamic hard-edge mask to handle the non-consistency in fast camera motion by blocking the boundary part and which generates more efficiency in the whole non-consistency mask. The proposed method is evaluated on the KITTI dataset, and the results demonstrate state-of-the-art performance with respect to other monocular deep VO and SLAM approaches.

Abstract (translated)

本文介绍了一种基于自主学习的道路车辆单目视觉里程计(VO)。在对相邻帧内的几何一致性进行建模时,大多数深VO方法都忽略了相机姿态的时间连续性,从而导致速度曲线上出现非常严重的锯齿状波动。通过观察道路车辆在大部分时间内趋向于平稳的动态特性,设计了描述非正常运动变化的惯性损失函数,帮助模型从长期的摄像机自运动中学习连续性。该方法基于循环卷积神经网络(RCNN)结构,通过扩展的长短期记忆(LSTM)块隐式地对道路车辆动力学和时间连续性进行建模。此外,我们还开发了动态硬边掩模,通过阻断边界部分来处理快速相机运动中的不一致性,从而在整个非一致性掩模中产生更高的效率。在Kitti数据集上对该方法进行了评估,结果表明该方法与其他单目深VO和SLAM方法相比具有最先进的性能。

URL

https://arxiv.org/abs/1905.01634

PDF

https://arxiv.org/pdf/1905.01634.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot