Paper Reading AI Learner

AutoCost: Evolving Intrinsic Cost for Zero-violation Reinforcement Learning

2023-01-24 22:51:29
Tairan He, Weiye Zhao, Changliu Liu

Abstract

Safety is a critical hurdle that limits the application of deep reinforcement learning (RL) to real-world control tasks. To this end, constrained reinforcement learning leverages cost functions to improve safety in constrained Markov decision processes. However, such constrained RL methods fail to achieve zero violation even when the cost limit is zero. This paper analyzes the reason for such failure, which suggests that a proper cost function plays an important role in constrained RL. Inspired by the analysis, we propose AutoCost, a simple yet effective framework that automatically searches for cost functions that help constrained RL to achieve zero-violation performance. We validate the proposed method and the searched cost function on the safe RL benchmark Safety Gym. We compare the performance of augmented agents that use our cost function to provide additive intrinsic costs with baseline agents that use the same policy learners but with only extrinsic costs. Results show that the converged policies with intrinsic costs in all environments achieve zero constraint violation and comparable performance with baselines.

Abstract (translated)

安全性是限制深度强化学习(RL)应用于实际控制任务的关键障碍。为此,约束强化学习利用成本函数改善约束马尔可夫决策过程的安全性。然而,即使成本限制为0,这些约束强化学习方法仍然无法达到零违反。本文分析了这种失败的原因,这表明适当的成本函数在约束强化学习中扮演着重要的角色。受到分析的启发,我们提出了AutoCost,一个简单但有效的框架,自动搜索帮助约束强化学习实现零违反性能的成本函数。我们在安全RL基准安全体育馆上验证了我们的方法以及搜索的成本函数。我们比较了添加我们的成本函数以提供累加内在成本的增强代理与使用相同策略学习器但只有外部成本的基线代理的性能。结果表明,在所有环境中,内在成本的共轭策略实现了零违反,与基线表现相当。

URL

https://arxiv.org/abs/2301.10339

PDF

https://arxiv.org/pdf/2301.10339.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot