Effective Reinforcement Learning Based on Structural Information Principles

Abstract
Abstract (translated)
URL
PDF

Abstract

Although Reinforcement Learning (RL) algorithms acquire sequential behavioral patterns through interactions with the environment, their effectiveness in noisy and high-dimensional scenarios typically relies on specific structural priors. In this paper, we propose a novel and general Structural Information principles-based framework for effective Decision-Making, namely SIDM, approached from an information-theoretic perspective. This paper presents a specific unsupervised partitioning method that forms vertex communities in the state and action spaces based on their feature similarities. An aggregation function, which utilizes structural entropy as the vertex weight, is devised within each community to obtain its embedding, thereby facilitating hierarchical state and action abstractions. By extracting abstract elements from historical trajectories, a directed, weighted, homogeneous transition graph is constructed. The minimization of this graph's high-dimensional entropy leads to the generation of an optimal encoding tree. An innovative two-layer skill-based learning mechanism is introduced to compute the common path entropy of each state transition as its identified probability, thereby obviating the requirement for expert knowledge. Moreover, SIDM can be flexibly incorporated into various single-agent and multi-agent RL algorithms, enhancing their performance. Finally, extensive evaluations on challenging benchmarks demonstrate that, compared with SOTA baselines, our framework significantly and consistently improves the policy's quality, stability, and efficiency up to 32.70%, 88.26%, and 64.86%, respectively.

Abstract (translated)

虽然强化学习（RL）算法通过与环境的交互来获取序列行为模式，但它们在嘈杂和高维场景中的有效性通常依赖于特定的结构先验。在本文中，我们从信息论的角度提出了一种新颖且通用的基于结构信息原理的决策-制定框架，称为SIDM。本文提出了一种特定的自适应无监督聚类方法，根据它们的特征相似性在状态和动作空间中形成顶点社区。在每个社区内，设计了一个利用结构熵作为顶点权重的聚合函数，从而获得其嵌入，促进层次化状态和动作抽象。通过从历史轨迹中提取抽象元素，构建了一个有向、加权、均匀的转移图形。这个图形的高维熵最小化导致生成最优编码树。引入了一种创新的两层技能基于学习机制，计算每个状态转移的共同路径熵作为其确定的概率，从而消除专家知识的需要。此外，SIDM可以灵活地应用于各种单智能体和多智能体强化学习算法，提高它们的性能。最后，在具有挑战性的基准测试中进行广泛的评估，与当前最佳基线相比，我们的框架在提高政策质量、稳定性和效率方面显著且一致地提高了32.70%、88.26%和64.86%。

URL

https://arxiv.org/abs/2404.09760

PDF

https://arxiv.org/pdf/2404.09760.pdf

Effective Reinforcement Learning Based on Structural Information Principles

Abstract

Abstract (translated)

URL

PDF Copy

PDF