Paper Reading AI Learner

Movable-Object-Aware Visual SLAM via Weakly Supervised Semantic Segmentation

2019-06-09 12:50:10
Ting Sun, Yuxiang Sun, Ming Liu, Dit-Yan Yeung

Abstract

Moving objects can greatly jeopardize the performance of a visual simultaneous localization and mapping (vSLAM) system which relies on the static-world assumption. Motion removal have seen successful on solving this problem. Two main streams of solutions are based on either geometry constraints or deep semantic segmentation neural network. The former rely on static majority assumption, and the latter require labor-intensive pixel-wise annotations. In this paper we propose to adopt a novel weakly-supervised semantic segmentation method. The segmentation mask is obtained from a CNN pre-trained with image-level class labels only. Thus, we leverage the power of deep semantic segmentation CNNs, while avoid requiring expensive annotations for training. We integrate our motion removal approach with the ORB-SLAM2 system. Experimental results on the TUM RGB-D and the KITTI stereo datasets demonstrate our superiority over the state-of-the-art.

Abstract (translated)

基于静态世界假设的视觉同步定位与映射(VSLAM)系统中,运动物体会极大地影响其性能。运动消除已经成功地解决了这个问题。两种主要的解决方案都是基于几何约束或深度语义分割神经网络。前者依赖于静态多数假设,后者需要劳动密集的像素级注释。本文提出了一种新的弱监督语义分割方法。分割掩模是从一个CNN获得的预先培训,只有图像级的阶级标签。因此,我们利用了深度语义分割CNN的强大功能,同时避免了培训时需要昂贵的注释。我们将运动消除方法与ORB-SLAM2系统集成。对TUM-RGB-D和Kitti立体数据集的实验结果表明,我们优于最先进的技术。

URL

https://arxiv.org/abs/1906.03629

PDF

https://arxiv.org/pdf/1906.03629.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot