Paper Reading AI Learner

Effective Reinforcement Learning Based on Structural Information Principles

2024-04-15 13:02:00
Xianghua Zeng, Hao Peng, Dingli Su, Angsheng Li

Abstract

Although Reinforcement Learning (RL) algorithms acquire sequential behavioral patterns through interactions with the environment, their effectiveness in noisy and high-dimensional scenarios typically relies on specific structural priors. In this paper, we propose a novel and general Structural Information principles-based framework for effective Decision-Making, namely SIDM, approached from an information-theoretic perspective. This paper presents a specific unsupervised partitioning method that forms vertex communities in the state and action spaces based on their feature similarities. An aggregation function, which utilizes structural entropy as the vertex weight, is devised within each community to obtain its embedding, thereby facilitating hierarchical state and action abstractions. By extracting abstract elements from historical trajectories, a directed, weighted, homogeneous transition graph is constructed. The minimization of this graph's high-dimensional entropy leads to the generation of an optimal encoding tree. An innovative two-layer skill-based learning mechanism is introduced to compute the common path entropy of each state transition as its identified probability, thereby obviating the requirement for expert knowledge. Moreover, SIDM can be flexibly incorporated into various single-agent and multi-agent RL algorithms, enhancing their performance. Finally, extensive evaluations on challenging benchmarks demonstrate that, compared with SOTA baselines, our framework significantly and consistently improves the policy's quality, stability, and efficiency up to 32.70%, 88.26%, and 64.86%, respectively.

Abstract (translated)

虽然强化学习(RL)算法通过与环境的交互来获取序列行为模式,但它们在嘈杂和高维场景中的有效性通常依赖于特定的结构先验。在本文中,我们从信息论的角度提出了一种新颖且通用的基于结构信息原理的决策-制定框架,称为SIDM。本文提出了一种特定的自适应无监督聚类方法,根据它们的特征相似性在状态和动作空间中形成顶点社区。在每个社区内,设计了一个利用结构熵作为顶点权重的聚合函数,从而获得其嵌入,促进层次化状态和动作抽象。通过从历史轨迹中提取抽象元素,构建了一个有向、加权、均匀的转移图形。这个图形的高维熵最小化导致生成最优编码树。引入了一种创新的两层技能基于学习机制,计算每个状态转移的共同路径熵作为其确定的概率,从而消除专家知识的需要。此外,SIDM可以灵活地应用于各种单智能体和多智能体强化学习算法,提高它们的性能。最后,在具有挑战性的基准测试中进行广泛的评估,与当前最佳基线相比,我们的框架在提高政策质量、稳定性和效率方面显著且一致地提高了32.70%、88.26%和64.86%。

URL

https://arxiv.org/abs/2404.09760

PDF

https://arxiv.org/pdf/2404.09760.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot