Abstract
Taking over arbitrary tasks like humans do with a mobile service robot in open-world settings requires a holistic scene perception for decision-making and high-level control. This paper presents a human-inspired scene perception model to minimize the gap between human and robotic capabilities. The approach takes over fundamental neuroscience concepts, such as a triplet perception split into recognition, knowledge representation, and knowledge interpretation. A recognition system splits the background and foreground to integrate exchangeable image-based object detectors and SLAM, a multi-layer knowledge base represents scene information in a hierarchical structure and offers interfaces for high-level control, and knowledge interpretation methods deploy spatio-temporal scene analysis and perceptual learning for self-adjustment. A single-setting ablation study is used to evaluate the impact of each component on the overall performance for a fetch-and-carry scenario in two simulated and one real-world environment.
Abstract (translated)
在开放世界环境中,使用类似于人类的移动机器人接管任意任务需要全局场景感知来进行决策和高级控制。本文提出了一种以人类为导向的场景感知模型,以缩小人类和机器人能力之间的差距。该方法接管了基本神经科学概念,如三元感知分为识别、知识表示和知识解释。一个识别系统将背景和前景分割为可交换的图像为基础的物体检测和SLAM,多层知识库以分层结构表示场景信息,并提供高级控制接口,知识解释方法部署了时空场景分析和高感知学习以自我调整。采用单设置消融实验来评估每个组件对在两个模拟和现实世界 fetch-and-carry 场景中总体性能的影响。
URL
https://arxiv.org/abs/2404.17791