Paper Reading AI Learner

Planning with State Abstractions for Non-Markovian Task Specifications

2019-05-28 21:28:17
Yoonseon Oh, Roma Patel, Thao Nguyen, Baichuan Huang, Ellie Pavlick, Stefanie Tellex

Abstract

Often times, we specify tasks for a robot using temporal language that can also span different levels of abstraction. The example command ``go to the kitchen before going to the second floor'' contains spatial abstraction, given that ``floor'' consists of individual rooms that can also be referred to in isolation ("kitchen", for example). There is also a temporal ordering of events, defined by the word "before". Previous works have used Linear Temporal Logic (LTL) to interpret temporal language (such as "before"), and Abstract Markov Decision Processes (AMDPs) to interpret hierarchical abstractions (such as "kitchen" and "second floor"), separately. To handle both types of commands at once, we introduce the Abstract Product Markov Decision Process (AP-MDP), a novel approach capable of representing non-Markovian reward functions at different levels of abstractions. The AP-MDP framework translates LTL into its corresponding automata, creates a product Markov Decision Process (MDP) of the LTL specification and the environment MDP, and decomposes the problem into subproblems to enable efficient planning with abstractions. AP-MDP performs faster than a non-hierarchical method of solving LTL problems in over 95% of tasks, and this number only increases as the size of the environment domain increases. We also present a neural sequence-to-sequence model trained to translate language commands into LTL expression, and a new corpus of non-Markovian language commands spanning different levels of abstraction. We test our framework with the collected language commands on a drone, demonstrating that our approach enables a robot to efficiently solve temporal commands at different levels of abstraction.

Abstract (translated)

通常,我们使用时间语言为机器人指定任务,这种语言也可以跨越不同的抽象级别。示例命令“去二楼之前先去厨房”包含空间抽象,因为“楼层”由单独的房间组成,也可以单独引用这些房间(“厨房”)。事件也有一个时间顺序,由单词“before”定义。以前的作品分别用线性时间逻辑(LTL)来解释时间语言(如“之前”)和抽象马尔可夫决策过程(AMDPS)来解释层次抽象(如“厨房”和“二楼”)。为了同时处理这两种类型的命令,我们引入了抽象产品马尔可夫决策过程(AP-MDP),这是一种能够在不同抽象层次上表示非马尔可夫报酬函数的新方法。AP-MDP框架将LTL转换为相应的自动机,创建LTL规范和环境MDP的产品马尔可夫决策过程(MDP),并将问题分解为子问题,以实现高效的抽象规划。在95%以上的任务中,AP-MDP比非层次的LTL问题解决方法执行得更快,而且这个数字只随着环境域大小的增加而增加。我们还提出了一个训练用于将语言命令转换为LTL表达式的神经序列到序列模型,以及一个跨越不同抽象层次的新的非马尔可夫语言命令集。我们在无人机上用收集到的语言命令测试我们的框架,证明我们的方法能够使机器人在不同的抽象层次上有效地解决时间命令。

URL

https://arxiv.org/abs/1905.12096

PDF

https://arxiv.org/pdf/1905.12096.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot