Planning with State Abstractions for Non-Markovian Task Specifications

Abstract
Abstract (translated)
URL
PDF

Abstract

Often times, we specify tasks for a robot using temporal language that can also span different levels of abstraction. The example command ``go to the kitchen before going to the second floor'' contains spatial abstraction, given that ``floor'' consists of individual rooms that can also be referred to in isolation ("kitchen", for example). There is also a temporal ordering of events, defined by the word "before". Previous works have used Linear Temporal Logic (LTL) to interpret temporal language (such as "before"), and Abstract Markov Decision Processes (AMDPs) to interpret hierarchical abstractions (such as "kitchen" and "second floor"), separately. To handle both types of commands at once, we introduce the Abstract Product Markov Decision Process (AP-MDP), a novel approach capable of representing non-Markovian reward functions at different levels of abstractions. The AP-MDP framework translates LTL into its corresponding automata, creates a product Markov Decision Process (MDP) of the LTL specification and the environment MDP, and decomposes the problem into subproblems to enable efficient planning with abstractions. AP-MDP performs faster than a non-hierarchical method of solving LTL problems in over 95% of tasks, and this number only increases as the size of the environment domain increases. We also present a neural sequence-to-sequence model trained to translate language commands into LTL expression, and a new corpus of non-Markovian language commands spanning different levels of abstraction. We test our framework with the collected language commands on a drone, demonstrating that our approach enables a robot to efficiently solve temporal commands at different levels of abstraction.

Abstract (translated)

通常，我们使用时间语言为机器人指定任务，这种语言也可以跨越不同的抽象级别。示例命令“去二楼之前先去厨房”包含空间抽象，因为“楼层”由单独的房间组成，也可以单独引用这些房间（“厨房”）。事件也有一个时间顺序，由单词“before”定义。以前的作品分别用线性时间逻辑（LTL）来解释时间语言（如“之前”）和抽象马尔可夫决策过程（AMDPS）来解释层次抽象（如“厨房”和“二楼”）。为了同时处理这两种类型的命令，我们引入了抽象产品马尔可夫决策过程（AP-MDP），这是一种能够在不同抽象层次上表示非马尔可夫报酬函数的新方法。AP-MDP框架将LTL转换为相应的自动机，创建LTL规范和环境MDP的产品马尔可夫决策过程（MDP），并将问题分解为子问题，以实现高效的抽象规划。在95%以上的任务中，AP-MDP比非层次的LTL问题解决方法执行得更快，而且这个数字只随着环境域大小的增加而增加。我们还提出了一个训练用于将语言命令转换为LTL表达式的神经序列到序列模型，以及一个跨越不同抽象层次的新的非马尔可夫语言命令集。我们在无人机上用收集到的语言命令测试我们的框架，证明我们的方法能够使机器人在不同的抽象层次上有效地解决时间命令。

URL

https://arxiv.org/abs/1905.12096

PDF

https://arxiv.org/pdf/1905.12096.pdf