Paper Reading AI Learner

Adaptive Regularization of Representation Rank as an Implicit Constraint of Bellman Equation

2024-04-19 10:00:34
Qiang He, Tianyi Zhou, Meng Fang, Setareh Maghsudi

Abstract

Representation rank is an important concept for understanding the role of Neural Networks (NNs) in Deep Reinforcement learning (DRL), which measures the expressive capacity of value networks. Existing studies focus on unboundedly maximizing this rank; nevertheless, that approach would introduce overly complex models in the learning, thus undermining performance. Hence, fine-tuning representation rank presents a challenging and crucial optimization problem. To address this issue, we find a guiding principle for adaptive control of the representation rank. We employ the Bellman equation as a theoretical foundation and derive an upper bound on the cosine similarity of consecutive state-action pairs representations of value networks. We then leverage this upper bound to propose a novel regularizer, namely BEllman Equation-based automatic rank Regularizer (BEER). This regularizer adaptively regularizes the representation rank, thus improving the DRL agent's performance. We first validate the effectiveness of automatic control of rank on illustrative experiments. Then, we scale up BEER to complex continuous control tasks by combining it with the deterministic policy gradient method. Among 12 challenging DeepMind control tasks, BEER outperforms the baselines by a large margin. Besides, BEER demonstrates significant advantages in Q-value approximation. Our code is available at this https URL.

Abstract (translated)

表示 rank是对神经网络(NNs)在深度强化学习(DRL)中的作用的一个重要概念,它衡量了价值网络的表征能力。现有研究集中于无限制地最大化这个排名;然而,那样的方法会在学习中引入过于复杂模型,从而削弱性能。因此,微调表示排名呈现了一个具有挑战性和关键性的优化问题。为解决这一问题,我们找到了一个指导原则用于自适应控制表示排名。我们利用贝叶斯方程作为理论基础,并得出连续状态-动作对价值网络表示的余弦相似性的上界。然后,我们利用这个上界提出了一种新 regularizer,即基于贝叶斯方程的自排名 regularizer (BEER)。这个 regularizer 会自适应地调整表示排名,从而提高 DRL 代理器的性能。我们首先通过示例实验验证了自动控制排名的有效性。然后,我们将 BEER 与确定性策略梯度方法结合,用于放大复杂连续控制任务。在 12 个具有挑战性的 DeepMind 控制任务中,BEER 超越了基线。此外,BEER 在 Q 值近似方面表现出显著优势。我们的代码可在此处访问:https:// this URL.

URL

https://arxiv.org/abs/2404.12754

PDF

https://arxiv.org/pdf/2404.12754.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot