Learning Probabilistic Reward Machines from Non-Markovian Stochastic Reward Processes

2021-07-09 19:00:39

Alvaro Velasquez, Andre Beckus, Taylor Dohmen, Ashutosh Trivedi, Noah Topper, George Atia

arXiv_AI

arXiv_AI Reinforcement_Learning Agent

Abstract
Abstract (translated)
URL
PDF

Abstract

The success of reinforcement learning in typical settings is, in part, predicated on underlying Markovian assumptions on the reward signal by which an agent learns optimal policies. In recent years, the use of reward machines has relaxed this assumption by enabling a structured representation of non-Markovian rewards. In particular, such representations can be used to augment the state space of the underlying decision process, thereby facilitating non-Markovian reinforcement learning. However, these reward machines cannot capture the semantics of stochastic reward signals. In this paper, we make progress on this front by introducing probabilistic reward machines (PRMs) as a representation of non-Markovian stochastic rewards. We present an algorithm to learn PRMs from the underlying decision process as well as to learn the PRM representation of a given decision-making policy.

Abstract (translated)

URL

https://arxiv.org/abs/2107.04633

PDF

https://arxiv.org/pdf/2107.04633.pdf