Cautious Actor-Critic

2021-07-12 06:40:02

Lingwei Zhu, Toshinori Kitamura, Takamitsu Matsubara

arXiv_AI

arXiv_AI Regularization Pose

Abstract
Abstract (translated)
URL
PDF

Abstract

The oscillating performance of off-policy learning and persisting errors in the actor-critic (AC) setting call for algorithms that can conservatively learn to suit the stability-critical applications better. In this paper, we propose a novel off-policy AC algorithm cautious actor-critic (CAC). The name cautious comes from the doubly conservative nature that we exploit the classic policy interpolation from conservative policy iteration for the actor and the entropy-regularization of conservative value iteration for the critic. Our key observation is the entropy-regularized critic facilitates and simplifies the unwieldy interpolated actor update while still ensuring robust policy improvement. We compare CAC to state-of-the-art AC methods on a set of challenging continuous control problems and demonstrate that CAC achieves comparable performance while significantly stabilizes learning.

Abstract (translated)

URL

https://arxiv.org/abs/2107.05217

PDF

https://arxiv.org/pdf/2107.05217.pdf