Abstract
Simultaneous Localization And Mapping (SLAM) is a fundamental problem in mobile robotics. While sparse point-based SLAM methods provide accurate camera localization, the generated maps lack semantic information. On the other hand, state of the art object detection methods provide rich information about entities present in the scene from a single image. This work incorporates a real-time deep-learned object detector to the monocular SLAM framework for representing generic objects as quadrics that permit detections to be seamlessly integrated while allowing the real-time performance. Finer reconstruction of an object, learned by a CNN network, is also incorporated and provides a shape prior for the quadric leading further refinement. To capture the dominant structure of the scene, additional planar landmarks are detected by a CNN-based plane detector and modeled as independent landmarks in the map. Extensive experiments support our proposed inclusion of semantic objects and planar structures directly in the bundle-adjustment of SLAM - Semantic SLAM - that enriches the reconstructed map semantically, while significantly improving the camera localization. The performance of our SLAM system is demonstrated in https://youtu.be/UMWXd4sHONw and https://youtu.be/QPQqVrvP0dE .
Abstract (translated)
同时定位和映射(SLAM)是移动机器人的一个基本问题。虽然基于稀疏点的SLAM方法提供了精确的摄像机定位,但生成的地图缺乏语义信息。另一方面,最先进的目标检测方法从单个图像中提供关于场景中存在的实体的丰富信息。这项工作将一个实时的深度学习对象检测器合并到单目击暴击框架中,用于将通用对象表示为四次曲面,允许检测无缝集成,同时允许实时性能。由CNN网络学习到的对象的更精细的重构也被合并,并为二次曲面的进一步精细化提供了一个先验形状。为了捕捉场景的主要结构,使用CNN的平面探测器检测到额外的平面标志,并在地图中建模为独立的标志。大量的实验支持我们提出的将语义对象和平面结构直接包含在SLAM(语义SLAM)的束调整中,这在语义上丰富了重建的地图,同时显著改善了相机的定位。我们的SLAM系统的性能在https://youtu.be/umwxd4shown和https://youtu.be/qpqvrvp0de中进行了演示。
URL
https://arxiv.org/abs/1809.09149