Logic-Skill Programming: An Optimization-based Approach to Sequential Skill Planning

Abstract
Abstract (translated)
URL
PDF

Abstract

Recent advances in robot skill learning have unlocked the potential to construct task-agnostic skill libraries, facilitating the seamless sequencing of multiple simple manipulation primitives (aka. skills) to tackle significantly more complex tasks. Nevertheless, determining the optimal sequence for independently learned skills remains an open problem, particularly when the objective is given solely in terms of the final geometric configuration rather than a symbolic goal. To address this challenge, we propose Logic-Skill Programming (LSP), an optimization-based approach that sequences independently learned skills to solve long-horizon tasks. We formulate a first-order extension of a mathematical program to optimize the overall cumulative reward of all skills within a plan, abstracted by the sum of value functions. To solve such programs, we leverage the use of Tensor Train to construct the value function space, and rely on alternations between symbolic search and skill value optimization to find the appropriate skill skeleton and optimal subgoal sequence. Experimental results indicate that the obtained value functions provide a superior approximation of cumulative rewards compared to state-of-the-art Reinforcement Learning methods. Furthermore, we validate LSP in three manipulation domains, encompassing both prehensile and non-prehensile primitives. The results demonstrate its capability to identify the optimal solution over the full logic and geometric path. The real-robot experiments showcase the effectiveness of our approach to cope with contact uncertainty and external disturbances in the real world.

Abstract (translated)

近年来，机器人技能学习的进步解锁了构建任务无关技能库的潜力，从而使多步简单操作原语（即技能）的无缝排序变得容易，从而应对显著更加复杂任务的处理。然而，确定独立学习的技能的最佳顺序仍然是一个开放问题，尤其是在目标仅基于最终几何配置而不是符号目标时。为解决这一挑战，我们提出了逻辑技能编程（LSP），一种基于优化的方法，将独立学习的技能序列化为解决长期时间间隔任务的解决方案。我们将一个数学程序的第一阶扩展用于优化整个累积奖励的所有技能，将其抽象为一个价值函数的和。为解决这类问题，我们利用了Tensor Train构建价值函数空间，并依赖于符号搜索和技能价值优化之间的交替来寻找适当的技能骨架和最优子目标序列。实验结果表明，所得到的价值函数能够比最先进的强化学习方法提供更好的累积奖励逼近。此外，我们在三个操作域中验证了LSP，包括抓握和非抓握原语。结果表明，我们的方法在整个逻辑和几何路径上能够找到最优解决方案。在现实世界的实验中，我们的方法应对接触不确定性和外部干扰的有效性得到了展示。

URL

https://arxiv.org/abs/2405.04082

PDF

https://arxiv.org/pdf/2405.04082.pdf

Logic-Skill Programming: An Optimization-based Approach to Sequential Skill Planning

Abstract

Abstract (translated)

URL

PDF Copy

PDF