Reinforcement Learning for Task Specifications with Action-Constraints

2022-01-02 04:22:01

Arun Raman, Keerthan Shagrithaya, Shalabh Bhatnagar

arXiv_AI

Abstract
Abstract (translated)
URL
PDF

Abstract

In this paper, we use concepts from supervisory control theory of discrete event systems to propose a method to learn optimal control policies for a finite-state Markov Decision Process (MDP) in which (only) certain sequences of actions are deemed unsafe (respectively safe). We assume that the set of action sequences that are deemed unsafe and/or safe are given in terms of a finite-state automaton; and propose a supervisor that disables a subset of actions at every state of the MDP so that the constraints on action sequence are satisfied. Then we present a version of the Q-learning algorithm for learning optimal policies in the presence of non-Markovian action-sequence and state constraints, where we use the development of reward machines to handle the state constraints. We illustrate the method using an example that captures the utility of automata-based methods for non-Markovian state and action specifications for reinforcement learning and show the results of simulations in this setting.

Abstract (translated)

URL

https://arxiv.org/abs/2201.00286

PDF

https://arxiv.org/pdf/2201.00286.pdf