Fast Global Convergence of Policy Optimization for Constrained MDPs

2021-10-31 17:46:26

Tao Liu, Ruida Zhou, Dileep Kalathil, P. R. Kumar, Chao Tian

arXiv_AI

Abstract
Abstract (translated)
URL
PDF

Abstract

We address the issue of safety in reinforcement learning. We pose the problem in a discounted infinite-horizon constrained Markov decision process framework. Existing results have shown that gradient-based methods are able to achieve an $\mathcal{O}(1/\sqrt{T})$ global convergence rate both for the optimality gap and the constraint violation. We exhibit a natural policy gradient-based algorithm that has a faster convergence rate $\mathcal{O}(\log(T)/T)$ for both the optimality gap and the constraint violation. When Slater's condition is satisfied and known a priori, zero constraint violation can be further guaranteed for a sufficiently large $T$ while maintaining the same convergence rate.

Abstract (translated)

URL

https://arxiv.org/abs/2111.00552

PDF

https://arxiv.org/pdf/2111.00552.pdf