Abstract
We present AIRS: Automatic Intrinsic Reward Shaping that intelligently and adaptively provides high-quality intrinsic rewards to enhance exploration in reinforcement learning (RL). More specifically, AIRS selects shaping function from a predefined set based on the estimated task return in real-time, providing reliable exploration incentives and alleviating the biased objective problem. Moreover, we develop an intrinsic reward toolkit to provide efficient and reliable implementations of diverse intrinsic reward approaches. We test AIRS on various tasks of Procgen games and DeepMind Control Suite. Extensive simulation demonstrates that AIRS can outperform the benchmarking schemes and achieve superior performance with simple architecture.
Abstract (translated)
我们介绍了AIRS:自动内在奖励塑造,它能够智能和自适应地提供高质量的内在奖励,以增强强化学习(RL)中的探索。更具体地说,AIRS从预先定义的一组中选择 shaping function,基于实时估计的任务回报,提供可靠的探索激励,减轻偏差目标问题。此外,我们开发了内在奖励 toolkit,以提供各种内在奖励方法的高效和可靠的实现。我们测试了Procgen游戏和DeepMind控制 Suite中的各种任务,进行了广泛的模拟,结果表明AIRS可以超越基准计划,并以简单的架构实现出色的性能。
URL
https://arxiv.org/abs/2301.10886