Average-reward model-free reinforcement learning: a systematic review and literature mapping

2020-10-18 05:06:01

Vektor Dewanto, George Dunn, Ali Eshragh, Marcus Gallagher, Fred Roosta

arXiv_AI

arXiv_AI Review Survey Reinforcement_Learning Optimization Agent

Abstract
Abstract (translated)
URL
PDF

Abstract

Model-free reinforcement learning (RL) has been an active area of research and provides a fundamental framework for agent-based learning and decision-making in artificial intelligence. In this paper, we review a specific subset of this literature, namely work that utilizes optimization criteria based on average rewards, in the infinite horizon setting. Average reward RL has the advantage of being the most selective criterion in recurrent (ergodic) Markov decision processes. In comparison to widely-used discounted reward criterion, it also requires no discount factor, which is a critical hyperparameter, and properly aligns the optimization and performance metrics. Motivated by the solo survey by Mahadevan (1996a), we provide an updated review of work in this area and extend it to cover policy-iteration and function approximation methods (in addition to the value-iteration and tabular counterparts). We also identify and discuss opportunities for future work.

Abstract (translated)

URL

https://arxiv.org/abs/2010.08920

PDF

https://arxiv.org/pdf/2010.08920.pdf