Constrained Reinforcement Learning for Dexterous Manipulation

Abstract
Abstract (translated)
URL
PDF

Abstract

Existing learning approaches to dexterous manipulation use demonstrations or interactions with the environment to train black-box neural networks that provide little control over how the robot learns the skills or how it would perform post training. These approaches pose significant challenges when implemented on physical platforms given that, during initial stages of training, the robot's behavior could be erratic and potentially harmful to its own hardware, the environment, or any humans in the vicinity. A potential way to address these limitations is to add constraints during learning that restrict and guide the robot's behavior during training as well as roll outs. Inspired by the success of constrained approaches in other domains, we investigate the effects of adding position-based constraints to a 24-DOF robot hand learning to perform object relocation using Constrained Policy Optimization. We find that a simple geometric constraint can ensure the robot learns to move towards the object sooner than without constraints. Further, training with this constraint requires a similar number of samples as its unconstrained counterpart to master the skill. These findings shed light on how simple constraints can help robots achieve sensible and safe behavior quickly and ease concerns surrounding hardware deployment. We also investigate the effects of the strictness of these constraints and report findings that provide insights into how different degrees of strictness affect learning outcomes. Our code is available at this https URL.

Abstract (translated)

现有的灵活操纵学习方法使用演示或与外部环境的互动来训练黑盒神经网络,这些方法对机器人学习技能或训练后表现的控制几乎没有。在实现于物理平台上时,这些方法提出了重大挑战,因为训练初期,机器人的行为可能不可预测,可能对其自身硬件、环境或周围任何人类造成潜在危害。一种可能的方法是在学习期间添加限制条件,限制并指导机器人的行为,同时推广。受到其他领域限制方法的成功启发,我们研究在24自由度机器人手学习对象移动中使用位置限制条件的影响。我们发现,一个简单的几何限制可以确保机器人比没有限制条件更快地移动到目标对象。此外,训练使用这个限制条件需要与它的无限制版本相同的样本数量以掌握技能。这些发现阐明了简单的限制如何帮助机器人快速实现明智和安全的行为,并减轻硬件部署方面的担心。我们还研究了这些限制条件的严格程度的影响,并报告了发现,提供了对这些限制不同严格程度的影响的理解。我们的代码可在以下httpsURL上可用。

URL

https://arxiv.org/abs/2301.09766

PDF

https://arxiv.org/pdf/2301.09766.pdf