Abstract
Robotics policies are always subjected to complex, second order dynamics that entangle their actions with resulting states. In reinforcement learning (RL) contexts, policies have the burden of deciphering these complicated interactions over massive amounts of experience and complex reward functions to learn how to accomplish tasks. Moreover, policies typically issue actions directly to controllers like Operational Space Control (OSC) or joint PD control, which induces straightline motion towards these action targets in task or joint space. However, straightline motion in these spaces for the most part do not capture the rich, nonlinear behavior our robots need to exhibit, shifting the burden of discovering these behaviors more completely to the agent. Unlike these simpler controllers, geometric fabrics capture a much richer and desirable set of behaviors via artificial, second order dynamics grounded in nonlinear geometry. These artificial dynamics shift the uncontrolled dynamics of a robot via an appropriate control law to form behavioral dynamics. Behavioral dynamics unlock a new action space and safe, guiding behavior over which RL policies are trained. Behavioral dynamics enable bang-bang-like RL policy actions that are still safe for real robots, simplify reward engineering, and help sequence real-world, high-performance policies. We describe the framework more generally and create a specific instantiation for the problem of dexterous, in-hand reorientation of a cube by a highly actuated robot hand.
Abstract (translated)
机器人策略总是受到复杂、二级非线性动力学的交互作用,这些交互作用会它们的动作与结果状态紧密耦合。在强化学习(RL)背景下,策略有义务通过大量的经验和复杂的奖励函数来揭示这些复杂交互,从而学会完成任务。此外,策略通常直接向诸如操作空间控制(OSC)或联合体比例控制(JPC)等控制器发出动作,这导致在任务或关节空间中产生直线运动,指向这些动作目标。然而,这些空间中的直线运动在很大程度上并没有捕捉到机器人需要展现的丰富、非线性的行为,将发现这些行为的负担更多地转移给代理。 与这些简单的控制器不同,几何面料通过基于非线性几何的人工第二级动力学捕捉到一个更丰富、更具有吸引力的行为集合。这些人工动力学通过适当的控制律将机器人的无控制动力学转移到行为动态中。行为动态开辟了一个新的动作空间,并引导机器人通过RL策略进行安全、引导行为。行为动态简化了奖励工程,并帮助实现真实世界的高性能策略。 我们更一般地描述这个框架,并为掌握高度激活的机器人手进行立方体灵活调整的问题创建了一个具体的实例。
URL
https://arxiv.org/abs/2405.02250