Exploration via Flow-Based Intrinsic Rewards

Abstract
Abstract (translated)
URL
PDF

Abstract

Exploration bonuses derived from the novelty of observations in an environment have become a popular approach to motivate exploration for reinforcement learning (RL) agents in the past few years. Recent methods such as curiosity-driven exploration usually estimate the novelty of new observations by the prediction errors of their system dynamics models. In this paper, we introduce the concept of optical flow estimation from the field of computer vision to the RL domain and utilize the errors from optical flow estimation to evaluate the novelty of new observations. We introduce a flow-based intrinsic curiosity module (FICM) capable of learning the motion features and understanding the observations in a more comprehensive and efficient fashion. We evaluate our method and compare it with a number of baselines on several benchmark environments, including Atari games, Super Mario Bros., and ViZDoom. Our results show that the proposed method is superior to the baselines in certain environments, especially for those featuring sophisticated moving patterns or with high-dimensional observation spaces. We further analyze the hyper-parameters used in the training phase and discuss our insights into them.

Abstract (translated)

在过去的几年中，由于环境中观察结果的新颖性而获得的探索奖励已成为激励强化学习（RL）代理探索的一种流行方法。最近的方法，如好奇心驱动的探索，通常通过系统动力学模型的预测误差来估计新观测的新颖性。本文将光流估计的概念从计算机视觉领域引入到RL领域，并利用光流估计的误差来评价新观测的新颖性。我们介绍了一个基于流的内在好奇心模块（FICM），它能够以更全面和更有效的方式学习运动特征和理解观察结果。我们评估了我们的方法，并将其与几个基准环境上的基线进行比较，包括Atari游戏、超级马里奥兄弟和Vizdoom。研究结果表明，该方法在一定环境下优于基线法，特别是在运动模式复杂或观测空间较大的环境下。我们进一步分析了训练阶段使用的超参数，并讨论了我们对这些参数的看法。

URL

https://arxiv.org/abs/1905.10071

PDF

https://arxiv.org/pdf/1905.10071.pdf