Using Curiosity for an Even Representation of Tasks in Continual Offline Reinforcement Learning

Abstract
Abstract (translated)
URL
PDF

Abstract

In this work, we investigate the means of using curiosity on replay buffers to improve offline multi-task continual reinforcement learning when tasks, which are defined by the non-stationarity in the environment, are non labeled and not evenly exposed to the learner in time. In particular, we investigate the use of curiosity both as a tool for task boundary detection and as a priority metric when it comes to retaining old transition tuples, which we respectively use to propose two different buffers. Firstly, we propose a Hybrid Reservoir Buffer with Task Separation (HRBTS), where curiosity is used to detect task boundaries that are not known due to the task agnostic nature of the problem. Secondly, by using curiosity as a priority metric when it comes to retaining old transition tuples, a Hybrid Curious Buffer (HCB) is proposed. We ultimately show that these buffers, in conjunction with regular reinforcement learning algorithms, can be used to alleviate the catastrophic forgetting issue suffered by the state of the art on replay buffers when the agent's exposure to tasks is not equal along time. We evaluate catastrophic forgetting and the efficiency of our proposed buffers against the latest works such as the Hybrid Reservoir Buffer (HRB) and the Multi-Time Scale Replay Buffer (MTR) in three different continual reinforcement learning settings. Experiments were done on classical control tasks and Metaworld environment. Experiments show that our proposed replay buffers display better immunity to catastrophic forgetting compared to existing works in most of the settings.

Abstract (translated)

在这项研究中，我们研究了在重新播放缓冲区中使用好奇心来改善无标签环境和非均匀暴露于学习者时的离线多任务强化学习的方法。特别是，我们研究了好奇心在检测任务边界和保留旧转移元组方面的使用，我们分别使用这些元组来提出两种不同的缓冲器。首先，我们提出了一个带有任务分离的混合水库缓冲器（HRBTS），其中好奇心用于检测由于问题对任务无关性而无法确定的任务边界。其次，通过将好奇心用作保留旧转移元组的优先度度量，我们提出了一个混合好奇缓冲器（HCB）。最后，我们证明了这些缓冲器与标准的强化学习算法相结合可以缓解当前关于重新播放缓冲器状态的灾难性遗忘问题。我们评估了灾难性遗忘以及我们提出的缓冲器的效率，这些缓冲器在三个不同的连续强化学习环境中进行了实验。实验在经典控制任务和元世界环境中进行。实验结果表明，与现有作品相比，我们提出的缓冲器在大多数设置中具有更好的抗灾难性遗忘能力。

URL

https://arxiv.org/abs/2312.03177

PDF

https://arxiv.org/pdf/2312.03177.pdf

Using Curiosity for an Even Representation of Tasks in Continual Offline Reinforcement Learning

Abstract

Abstract (translated)

URL

PDF Copy

PDF