Computational Benefits of Intermediate Rewards for Hierarchical Planning

2021-07-08 16:39:13

Yuexiang Zhai, Christina Baek, Zhengyuan Zhou, Jiantao Jiao, Yi Ma

arXiv_AI

Abstract
Abstract (translated)
URL
PDF

Abstract

Many hierarchical reinforcement learning (RL) applications have empirically verified that incorporating prior knowledge in reward design improves convergence speed and practical performance. We attempt to quantify the computational benefits of hierarchical RL from a planning perspective under assumptions about the intermediate state and intermediate rewards frequently (but often implicitly) adopted in practice. Our approach reveals a trade-off between computational complexity and the pursuit of the shortest path in hierarchical planning: using intermediate rewards significantly reduces the computational complexity in finding a successful policy but does not guarantee to find the shortest path, whereas using sparse terminal rewards finds the shortest path at a significantly higher computational cost. We also corroborate our theoretical results with extensive experiments on the MiniGrid environments using Q-learning and other popular deep RL algorithms.

Abstract (translated)

URL

https://arxiv.org/abs/2107.03961

PDF

https://arxiv.org/pdf/2107.03961.pdf