Abstract
Discovering an informative, or agent-centric, state representation that encodes only the relevant information while discarding the irrelevant is a key challenge towards scaling reinforcement learning algorithms and efficiently applying them to downstream tasks. Prior works studied this problem in high-dimensional Markovian environments, when the current observation may be a complex object but is sufficient to decode the informative state. In this work, we consider the problem of discovering the agent-centric state in the more challenging high-dimensional non-Markovian setting, when the state can be decoded from a sequence of past observations. We establish that generalized inverse models can be adapted for learning agent-centric state representation for this task. Our results include asymptotic theory in the deterministic dynamics setting as well as counter-examples for alternative intuitive algorithms. We complement these findings with a thorough empirical study on the agent-centric state discovery abilities of the different alternatives we put forward. Particularly notable is our analysis of past actions, where we show that these can be a double-edged sword: making the algorithms more successful when used correctly and causing dramatic failure when used incorrectly.
Abstract (translated)
发现一个 informative、或以 agent-为中心的状态表示,仅编码相关信息而丢弃无关信息,是扩展强化学习算法并将其应用于下游任务的关键挑战。先前的研究在高度维度的马尔可夫环境中研究了这个问题,当时当前观察可能是一个复杂的对象,但足够解密相关信息的状态。在本文中,我们考虑在更高维度的非马尔可夫设置中发现代理中心状态的问题,当时状态可以从过去的观察序列中编码。我们证明了扩展倒模模型可以用于学习这种任务的代理中心状态表示。我们的结果包括确定性动态系统设置下的渐进理论以及关于替代直觉算法的反例。我们通过对这些替代算法的实验研究来补充这些发现。特别是值得注意的是我们对过去动作的分析,我们表明这些动作可以是双刃剑:正确使用时使算法更加成功,而错误使用时会导致戏剧性的失败。
URL
https://arxiv.org/abs/2404.14552