Anti-Exploration by Random Network Distillation

2023-01-31 13:18:33

Alexander Nikulin, Vladislav Kurenkov, Denis Tarasov, Sergey Kolesnikov

arXiv_AI

arXiv_AI Reinforcement_Learning Action

Abstract
Abstract (translated)
URL
PDF

Abstract

Despite the success of Random Network Distillation (RND) in various domains, it was shown as not discriminative enough to be used as an uncertainty estimator for penalizing out-of-distribution actions in offline reinforcement learning. In this paper, we revisit these results and show that, with a naive choice of conditioning for the RND prior, it becomes infeasible for the actor to effectively minimize the anti-exploration bonus and discriminativity is not an issue. We show that this limitation can be avoided with conditioning based on Feature-wise Linear Modulation (FiLM), resulting in a simple and efficient ensemble-free algorithm based on Soft Actor-Critic. We evaluate it on the D4RL benchmark, showing that it is capable of achieving performance comparable to ensemble-based methods and outperforming ensemble-free approaches by a wide margin.

Abstract (translated)

尽管随机网络蒸馏(RND)在各种领域中取得了成功,但研究表明它没有足够的 discrimination 用于 offline 强化学习中惩罚不符合分布的行动。在本文中,我们重新审视了这些结果,并表明,使用简单的条件化选择 RND 的前置条件,演员实际上无法有效地最小化反探索奖励, discrimination 不是一个问题。我们表明,这种限制可以通过基于特征线性调制(FiLM)的条件化避免,导致基于软演员批评的简单而高效的无组合算法。我们基于 D4RL 基准对其进行了评估,表明它有能力实现与组合方法相当的性能,并以显著优势超越无组合方法。

URL

https://arxiv.org/abs/2301.13616

PDF

https://arxiv.org/pdf/2301.13616.pdf