Abstract
The ability to plan actions on multiple levels of abstraction enables intelligent agents to solve complex tasks effectively. However, learning the models for both low and high-level planning from demonstrations has proven challenging, especially with higher-dimensional inputs. To address this issue, we propose to use reinforcement learning to identify subgoals in expert trajectories by associating the magnitude of the rewards with the predictability of low-level actions given the state and the chosen subgoal. We build a vector-quantized generative model for the identified subgoals to perform subgoal-level planning. In experiments, the algorithm excels at solving complex, long-horizon decision-making problems outperforming state-of-the-art. Because of its ability to plan, our algorithm can find better trajectories than the ones in the training set
Abstract (translated)
能够规划抽象层次的多个行动,使智能代理能够有效地解决复杂的任务。然而,从演示中学习低和高级别的规划模型已经证明具有挑战性,特别是在高维度输入的情况下。为了解决这个问题,我们提议使用强化学习来识别专家轨迹中的子目标,通过将奖励的大小与给定状态和选择的子目标的可预测性联系起来。我们构建了一个向量量化的生成模型,用于执行子目标级别的规划。在实验中,该算法 excels at解决复杂的长期决策问题,优于当前的最新技术。由于其规划能力,我们的算法能够找到比训练集更好的轨迹。
URL
https://arxiv.org/abs/2301.12962