Paper Reading AI Learner

Salient Sparse Visual Odometry With Pose-Only Supervision

2024-04-06 16:48:08
Siyu Chen, Kangcheng Liu, Chen Wang, Shenghai Yuan, Jianfei Yang, Lihua Xie

Abstract

Visual Odometry (VO) is vital for the navigation of autonomous systems, providing accurate position and orientation estimates at reasonable costs. While traditional VO methods excel in some conditions, they struggle with challenges like variable lighting and motion blur. Deep learning-based VO, though more adaptable, can face generalization problems in new environments. Addressing these drawbacks, this paper presents a novel hybrid visual odometry (VO) framework that leverages pose-only supervision, offering a balanced solution between robustness and the need for extensive labeling. We propose two cost-effective and innovative designs: a self-supervised homographic pre-training for enhancing optical flow learning from pose-only labels and a random patch-based salient point detection strategy for more accurate optical flow patch extraction. These designs eliminate the need for dense optical flow labels for training and significantly improve the generalization capability of the system in diverse and challenging environments. Our pose-only supervised method achieves competitive performance on standard datasets and greater robustness and generalization ability in extreme and unseen scenarios, even compared to dense optical flow-supervised state-of-the-art methods.

Abstract (translated)

视觉里程计(VO)对于自主系统的导航至关重要,它可以在合理的成本下提供准确的定位和方向估计。虽然传统的VO方法在某些情况下表现出色,但它们在多变的光线和运动模糊等情况下遇到了挑战。基于深度学习的VO虽然更具有适应性,但在新的环境中可能会面临泛化问题。为了解决这些缺点,本文提出了一种新颖的混合视觉里程计(VO)框架,该框架利用姿态仅监督,提供了一种平衡的解决方案,即稳健性和大量标注的必要性。我们提出了两种成本效益和创新的设想:自监督同构预训练以增强姿态仅标签的光学流学习,以及基于随机补丁的显着点检测策略,用于更准确的光学流补丁提取。这些设计消除了训练和密集光学流标签的需要,显著提高了系统在多样和具有挑战性的环境中的泛化能力。我们姿态仅监督的方法在标准数据集上实现了与先进密集光学流监督方法的竞争性能,在极端和未见场景中具有更大的鲁棒性和泛化能力,即使与密集光学流监督方法相比也是如此。

URL

https://arxiv.org/abs/2404.04677

PDF

https://arxiv.org/pdf/2404.04677.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot