Paper Reading AI Learner

Logic-Skill Programming: An Optimization-based Approach to Sequential Skill Planning

2024-05-07 07:27:28
Teng Xue, Amirreza Razmjoo, Suhan Shetty, Sylvain Calinon

Abstract

Recent advances in robot skill learning have unlocked the potential to construct task-agnostic skill libraries, facilitating the seamless sequencing of multiple simple manipulation primitives (aka. skills) to tackle significantly more complex tasks. Nevertheless, determining the optimal sequence for independently learned skills remains an open problem, particularly when the objective is given solely in terms of the final geometric configuration rather than a symbolic goal. To address this challenge, we propose Logic-Skill Programming (LSP), an optimization-based approach that sequences independently learned skills to solve long-horizon tasks. We formulate a first-order extension of a mathematical program to optimize the overall cumulative reward of all skills within a plan, abstracted by the sum of value functions. To solve such programs, we leverage the use of Tensor Train to construct the value function space, and rely on alternations between symbolic search and skill value optimization to find the appropriate skill skeleton and optimal subgoal sequence. Experimental results indicate that the obtained value functions provide a superior approximation of cumulative rewards compared to state-of-the-art Reinforcement Learning methods. Furthermore, we validate LSP in three manipulation domains, encompassing both prehensile and non-prehensile primitives. The results demonstrate its capability to identify the optimal solution over the full logic and geometric path. The real-robot experiments showcase the effectiveness of our approach to cope with contact uncertainty and external disturbances in the real world.

Abstract (translated)

近年来,机器人技能学习的进步解锁了构建任务无关技能库的潜力,从而使多步简单操作原语(即技能)的无缝排序变得容易,从而应对显著更加复杂任务的处理。然而,确定独立学习的技能的最佳顺序仍然是一个开放问题,尤其是在目标仅基于最终几何配置而不是符号目标时。为解决这一挑战,我们提出了逻辑技能编程(LSP),一种基于优化的方法,将独立学习的技能序列化为解决长期时间间隔任务的解决方案。我们将一个数学程序的第一阶扩展用于优化整个累积奖励的所有技能,将其抽象为一个价值函数的和。为解决这类问题,我们利用了Tensor Train构建价值函数空间,并依赖于符号搜索和技能价值优化之间的交替来寻找适当的技能骨架和最优子目标序列。实验结果表明,所得到的价值函数能够比最先进的强化学习方法提供更好的累积奖励逼近。此外,我们在三个操作域中验证了LSP,包括抓握和非抓握原语。结果表明,我们的方法在整个逻辑和几何路径上能够找到最优解决方案。在现实世界的实验中,我们的方法应对接触不确定性和外部干扰的有效性得到了展示。

URL

https://arxiv.org/abs/2405.04082

PDF

https://arxiv.org/pdf/2405.04082.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot