A study of first-passage time minimization via Q-learning in heated gridworlds

2021-10-05 16:01:44

M.A. Larchenko, P. Osinenko, G. Yaremenko, V.V. Palyulin

arXiv_AI

Abstract
Abstract (translated)
URL
PDF

Abstract

Optimization of first-passage times is required in applications ranging from nanobots navigation to market trading. In such settings, one often encounters unevenly distributed noise levels across the environment. We extensively study how a learning agent fares in 1- and 2- dimensional heated gridworlds with an uneven temperature distribution. The results show certain bias effects in agents trained via simple tabular Q-learning, SARSA, Expected SARSA and Double Q-learning. While high learning rate prevents exploration of regions with higher temperature, low enough rate increases the presence of agents in such regions. The discovered peculiarities and biases of temporal-difference-based reinforcement learning methods should be taken into account in real-world physical applications and agent design.

Abstract (translated)

URL

https://arxiv.org/abs/2110.02129

PDF

https://arxiv.org/pdf/2110.02129.pdf