Abstract
Data efficiency in robotic skill acquisition is crucial for operating robots in varied small-batch assembly settings. To operate in such environments, robots must have robust obstacle avoidance and versatile goal conditioning acquired from only a few simple demonstrations. Existing approaches, however, fall short of these requirements. Deep reinforcement learning (RL) enables a robot to learn complex manipulation tasks but is often limited to small task spaces in the real world due to sample inefficiency and safety concerns. Motion planning (MP) can generate collision-free paths in obstructed environments, but cannot solve complex manipulation tasks and requires goal states often specified by a user or object-specific pose estimator. In this work, we propose a system for efficient skill acquisition that leverages an object-centric generative model (OCGM) for versatile goal identification to specify a goal for MP combined with RL to solve complex manipulation tasks in obstructed environments. Specifically, OCGM enables one-shot target object identification and re-identification in new scenes, allowing MP to guide the robot to the target object while avoiding obstacles. This is combined with a skill transition network, which bridges the gap between terminal states of MP and feasible start states of a sample-efficient RL policy. The experiments demonstrate that our OCGM-based one-shot goal identification provides competitive accuracy to other baseline approaches and that our modular framework outperforms competitive baselines, including a state-of-the-art RL algorithm, by a significant margin for complex manipulation tasks in obstructed environments.
Abstract (translated)
机器人技能获取数据的高效性对于在各种小型批量装配环境中操作机器人至关重要。在这样的环境中操作机器人,机器人需要具备强大的避免障碍物和多功能目标 conditioning 的能力,而现有的方法却无法满足这些要求。 Deep reinforcement learning (RL) 可以使得机器人学习复杂的操纵任务,但通常由于样本效率低下和安全问题,只能在现实世界中限制在较小的任务空间内。 Motion planning (MP) 可以在障碍物环境中生成无碰撞的路径,但无法解决复杂的操纵任务,并且通常需要用户或物体特定姿态估计器指定的目标状态。在本文中,我们提出了一种高效的技能获取系统,利用对象中心生成模型 (OCGM) 进行多功能目标识别,以指定 MP 和 RL 共同解决在障碍物环境中复杂操纵任务的目标。具体来说,OCGM 在新的场景中实现一次性的目标对象识别和重新识别,使 MP 能够引导机器人避开障碍物,并将机器人导向目标对象。这结合了技能转移网络,它 bridging the gap between the terminal states of MP and feasible start states of a sample-efficient RL policy。实验表明,我们的 OCGM 一次性目标识别方法与其他基准方法相比,在障碍物环境中解决复杂操纵任务具有竞争力的精度,我们的模块化框架在这方面比竞争基准方法,包括最先进的 RL 算法,表现优异。
URL
https://arxiv.org/abs/2303.03365