Paper Reading AI Learner

Implicit Shape Model Trees: Recognition of 3-D Indoor Scenes and Prediction of Object Poses for Mobile Robots

2023-01-25 16:20:56
Pascal Meißner, Rüdiger Dillmann

Abstract

For a mobile robot, we present an approach to recognize scenes in arrangements of objects distributed over cluttered environments. Recognition is made possible by letting the robot alternately search for objects and assign found objects to scenes. Our scene model "Implicit Shape Model (ISM) trees" allows us to solve these two tasks together. For the ISM trees, this article presents novel algorithms for recognizing scenes and predicting the poses of searched objects. We define scenes as sets of objects, where some objects are connected by 3-D spatial relations. In previous work, we recognized scenes using single ISMs. However, these ISMs were prone to false positives. To address this problem, we introduced ISM trees, a hierarchical model that includes multiple ISMs. Through the recognition algorithm it contributes, this article ultimately enables the use of ISM trees in scene recognition. We intend to enable users to generate ISM trees from object arrangements demonstrated by humans. The lack of a suitable algorithm is overcome by the introduction of an ISM tree generation algorithm. In scene recognition, it is usually assumed that image data is already available. However, this is not always the case for robots. For this reason, we combined scene recognition and object search in previous work. However, we did not provide an efficient algorithm to link the two tasks. This article introduces such an algorithm that predicts the poses of searched objects with relations. Experiments show that our overall approach enables robots to find and recognize object arrangements that cannot be perceived from a single viewpoint.

Abstract (translated)

对于一个移动机器人,我们提出了一种方法,用于在分散在拥挤环境中的物体排列中识别场景。通过让机器人交替搜索物体并将它们分配给场景,可以才能实现识别。我们的场景模型“隐含形状模型(ISM)树”可以帮助我们解决这两个任务。对于ISM树,本文介绍了一种新的算法,用于识别场景和预测搜索物体的姿态。我们定义场景为一组物体,其中一些物体通过3D空间关系连接在一起。在以前的工作中,我们使用单个ISM树来识别场景。但是,这些ISM树容易出现 false positive。为了解决这一问题,我们引入了ISM树,一个包括多个ISM树的层级模型。通过贡献识别算法,本文最终使可以使用ISM树在场景识别中使用。我们旨在使用户从人类展示的物体排列中生成ISM树。缺少适当的算法可以通过引入ISM树生成算法来解决。在场景识别中,通常假设图像数据已经存在。但是,对于机器人来说,这并不是总是如此。因此,在以前的工作中,我们同时处理场景识别和物体搜索任务。但是,我们没有提供连接这两个任务的高效算法。本文介绍了一种预测搜索物体姿态并与关系相关的算法。实验表明,我们的整体方法使机器人能够找到和识别从单一视角无法感知的物体排列。

URL

https://arxiv.org/abs/2301.10672

PDF

https://arxiv.org/pdf/2301.10672.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot