Paper Reading AI Learner

Robust Dense Mapping for Large-Scale Dynamic Environments

2019-05-07 19:38:27
Ioan Andrei Bârsan, Peidong Liu, Marc Pollefeys, Andreas Geiger

Abstract

We present a stereo-based dense mapping algorithm for large-scale dynamic urban environments. In contrast to other existing methods, we simultaneously reconstruct the static background, the moving objects, and the potentially moving but currently stationary objects separately, which is desirable for high-level mobile robotic tasks such as path planning in crowded environments. We use both instance-aware semantic segmentation and sparse scene flow to classify objects as either background, moving, or potentially moving, thereby ensuring that the system is able to model objects with the potential to transition from static to dynamic, such as parked cars. Given camera poses estimated from visual odometry, both the background and the (potentially) moving objects are reconstructed separately by fusing the depth maps computed from the stereo input. In addition to visual odometry, sparse scene flow is also used to estimate the 3D motions of the detected moving objects, in order to reconstruct them accurately. A map pruning technique is further developed to improve reconstruction accuracy and reduce memory consumption, leading to increased scalability. We evaluate our system thoroughly on the well-known KITTI dataset. Our system is capable of running on a PC at approximately 2.5Hz, with the primary bottleneck being the instance-aware semantic segmentation, which is a limitation we hope to address in future work. The source code is available from the project website (<a href="http://andreibarsan.github.io/dynslam">this http URL</a>).

Abstract (translated)

提出了一种适用于大规模动态城市环境的立体密集映射算法。与其他现有的方法相比,我们同时分别重建静态背景、移动对象和可能移动但目前静止的对象,这对于高级别的移动机器人任务(如拥挤环境中的路径规划)是可取的。我们同时使用实例感知语义分割和稀疏场景流将对象分类为背景、移动或潜在移动,从而确保系统能够建模具有从静态到动态转换潜力的对象,例如停车场。根据视觉里程计估算出的摄像机姿态,通过融合立体输入计算出的深度图,分别重建背景和(潜在的)移动物体。除了视觉里程测量外,稀疏场景流还用于估计被检测运动物体的三维运动,以便准确地重建运动物体。为了提高重建精度和减少内存消耗,进一步发展了地图修剪技术,从而提高了可扩展性。我们在著名的Kitti数据集上对我们的系统进行了全面的评估。我们的系统能够以大约2.5Hz的频率在PC上运行,主要的瓶颈是实例感知语义分割,这是我们希望在未来工作中解决的一个限制。源代码可从项目网站获得(<a href=“http://anderibarsan.github.io/dynslam”>this http url<a>)。

URL

https://arxiv.org/abs/1905.02781

PDF

https://arxiv.org/pdf/1905.02781.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot