Paper Reading AI Learner

Real-Time Monocular Object-Model Aware Sparse SLAM

2019-03-06 06:24:40
Mehdi Hosseinzadeh, Kejie Li, Yasir Latif, Ian Reid

Abstract

Simultaneous Localization And Mapping (SLAM) is a fundamental problem in mobile robotics. While sparse point-based SLAM methods provide accurate camera localization, the generated maps lack semantic information. On the other hand, state of the art object detection methods provide rich information about entities present in the scene from a single image. This work incorporates a real-time deep-learned object detector to the monocular SLAM framework for representing generic objects as quadrics that permit detections to be seamlessly integrated while allowing the real-time performance. Finer reconstruction of an object, learned by a CNN network, is also incorporated and provides a shape prior for the quadric leading further refinement. To capture the dominant structure of the scene, additional planar landmarks are detected by a CNN-based plane detector and modeled as independent landmarks in the map. Extensive experiments support our proposed inclusion of semantic objects and planar structures directly in the bundle-adjustment of SLAM - Semantic SLAM - that enriches the reconstructed map semantically, while significantly improving the camera localization. The performance of our SLAM system is demonstrated in https://youtu.be/UMWXd4sHONw and https://youtu.be/QPQqVrvP0dE .

Abstract (translated)

同时定位和映射(SLAM)是移动机器人的一个基本问题。虽然基于稀疏点的SLAM方法提供了精确的摄像机定位,但生成的地图缺乏语义信息。另一方面,最先进的目标检测方法从单个图像中提供关于场景中存在的实体的丰富信息。这项工作将一个实时的深度学习对象检测器合并到单目击暴击框架中,用于将通用对象表示为四次曲面,允许检测无缝集成,同时允许实时性能。由CNN网络学习到的对象的更精细的重构也被合并,并为二次曲面的进一步精细化提供了一个先验形状。为了捕捉场景的主要结构,使用CNN的平面探测器检测到额外的平面标志,并在地图中建模为独立的标志。大量的实验支持我们提出的将语义对象和平面结构直接包含在SLAM(语义SLAM)的束调整中,这在语义上丰富了重建的地图,同时显著改善了相机的定位。我们的SLAM系统的性能在https://youtu.be/umwxd4shown和https://youtu.be/qpqvrvp0de中进行了演示。

URL

https://arxiv.org/abs/1809.09149

PDF

https://arxiv.org/pdf/1809.09149.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot