Parameter-Free Deterministic Reduction of the Estimation Bias in Continuous Control

2021-09-24 07:41:07

Baturay Saglam, Enes Duran, Dogan C. Cicek, Furkan B. Mutlu, Suleyman S. Kozat

arXiv_AI

arXiv_AI Reinforcement_Learning Agent

Abstract
Abstract (translated)
URL
PDF

Abstract

Approximation of the value functions in value-based deep reinforcement learning systems induces overestimation bias, resulting in suboptimal policies. We show that when the reinforcement signals received by the agents have a high variance, deep actor-critic approaches that overcome the overestimation bias lead to a substantial underestimation bias. We introduce a parameter-free, novel deep Q-learning variant to reduce this underestimation bias for continuous control. By obtaining fixed weights in computing the critic objective as a linear combination of the approximate critic functions, our Q-value update rule integrates the concepts of Clipped Double Q-learning and Maxmin Q-learning. We test the performance of our improvement on a set of MuJoCo and Box2D continuous control tasks and find that it improves the state-of-the-art and outperforms the baseline algorithms in the majority of the environments.

Abstract (translated)

URL

https://arxiv.org/abs/2109.11788

PDF

https://arxiv.org/pdf/2109.11788.pdf