Abstract
Representation rank is an important concept for understanding the role of Neural Networks (NNs) in Deep Reinforcement learning (DRL), which measures the expressive capacity of value networks. Existing studies focus on unboundedly maximizing this rank; nevertheless, that approach would introduce overly complex models in the learning, thus undermining performance. Hence, fine-tuning representation rank presents a challenging and crucial optimization problem. To address this issue, we find a guiding principle for adaptive control of the representation rank. We employ the Bellman equation as a theoretical foundation and derive an upper bound on the cosine similarity of consecutive state-action pairs representations of value networks. We then leverage this upper bound to propose a novel regularizer, namely BEllman Equation-based automatic rank Regularizer (BEER). This regularizer adaptively regularizes the representation rank, thus improving the DRL agent's performance. We first validate the effectiveness of automatic control of rank on illustrative experiments. Then, we scale up BEER to complex continuous control tasks by combining it with the deterministic policy gradient method. Among 12 challenging DeepMind control tasks, BEER outperforms the baselines by a large margin. Besides, BEER demonstrates significant advantages in Q-value approximation. Our code is available at this https URL.
Abstract (translated)
表示 rank是对神经网络(NNs)在深度强化学习(DRL)中的作用的一个重要概念,它衡量了价值网络的表征能力。现有研究集中于无限制地最大化这个排名;然而,那样的方法会在学习中引入过于复杂模型,从而削弱性能。因此,微调表示排名呈现了一个具有挑战性和关键性的优化问题。为解决这一问题,我们找到了一个指导原则用于自适应控制表示排名。我们利用贝叶斯方程作为理论基础,并得出连续状态-动作对价值网络表示的余弦相似性的上界。然后,我们利用这个上界提出了一种新 regularizer,即基于贝叶斯方程的自排名 regularizer (BEER)。这个 regularizer 会自适应地调整表示排名,从而提高 DRL 代理器的性能。我们首先通过示例实验验证了自动控制排名的有效性。然后,我们将 BEER 与确定性策略梯度方法结合,用于放大复杂连续控制任务。在 12 个具有挑战性的 DeepMind 控制任务中,BEER 超越了基线。此外,BEER 在 Q 值近似方面表现出显著优势。我们的代码可在此处访问:https:// this URL.
URL
https://arxiv.org/abs/2404.12754