Abstract
Moving objects can greatly jeopardize the performance of a visual simultaneous localization and mapping (vSLAM) system which relies on the static-world assumption. Motion removal have seen successful on solving this problem. Two main streams of solutions are based on either geometry constraints or deep semantic segmentation neural network. The former rely on static majority assumption, and the latter require labor-intensive pixel-wise annotations. In this paper we propose to adopt a novel weakly-supervised semantic segmentation method. The segmentation mask is obtained from a CNN pre-trained with image-level class labels only. Thus, we leverage the power of deep semantic segmentation CNNs, while avoid requiring expensive annotations for training. We integrate our motion removal approach with the ORB-SLAM2 system. Experimental results on the TUM RGB-D and the KITTI stereo datasets demonstrate our superiority over the state-of-the-art.
Abstract (translated)
基于静态世界假设的视觉同步定位与映射(VSLAM)系统中,运动物体会极大地影响其性能。运动消除已经成功地解决了这个问题。两种主要的解决方案都是基于几何约束或深度语义分割神经网络。前者依赖于静态多数假设,后者需要劳动密集的像素级注释。本文提出了一种新的弱监督语义分割方法。分割掩模是从一个CNN获得的预先培训,只有图像级的阶级标签。因此,我们利用了深度语义分割CNN的强大功能,同时避免了培训时需要昂贵的注释。我们将运动消除方法与ORB-SLAM2系统集成。对TUM-RGB-D和Kitti立体数据集的实验结果表明,我们优于最先进的技术。
URL
https://arxiv.org/abs/1906.03629