Paper Reading AI Learner

Control invariant set enhanced safe reinforcement learning: improved sampling efficiency, guaranteed stability and robustness

2023-05-24 22:22:19
Song Bo, Bernard T. Agyeman, Xunyuan Yin, Jinfeng Liu (University of Alberta)

Abstract

Reinforcement learning (RL) is an area of significant research interest, and safe RL in particular is attracting attention due to its ability to handle safety-driven constraints that are crucial for real-world applications. This work proposes a novel approach to RL training, called control invariant set (CIS) enhanced RL, which leverages the advantages of utilizing the explicit form of CIS to improve stability guarantees and sampling efficiency. Furthermore, the robustness of the proposed approach is investigated in the presence of uncertainty. The approach consists of two learning stages: offline and online. In the offline stage, CIS is incorporated into the reward design, initial state sampling, and state reset procedures. This incorporation of CIS facilitates improved sampling efficiency during the offline training process. In the online stage, RL is retrained whenever the predicted next step state is outside of the CIS, which serves as a stability criterion, by introducing a Safety Supervisor to examine the safety of the action and make necessary corrections. The stability analysis is conducted for both cases, with and without uncertainty. To evaluate the proposed approach, we apply it to a simulated chemical reactor. The results show a significant improvement in sampling efficiency during offline training and closed-loop stability guarantee in the online implementation, with and without uncertainty.

Abstract (translated)

强化学习(RL)是一个具有重要研究兴趣的领域,特别是安全RL备受关注,因为它能够处理对于实际应用程序至关重要的安全驱动约束。这项工作提出了一种新颖的RL训练方法,称为控制不变集(CIS)增强RL,该方法利用CIS的显式形式来提高稳定性保证和采样效率。此外,在存在不确定性的情况下,该方法研究了 proposed 方法的鲁棒性。方法分为两个学习阶段: offline 和 online。在 offline 阶段,CIS 被嵌入到奖励设计、初始状态采样和状态重置程序中。这种方法的嵌入在 offline 训练过程中促进了更好的采样效率。在 online 阶段,每次预测的下一个状态都超出了 CIS,它作为稳定性准则,引入了安全主管来检查行动的安全性并进行必要的纠正。稳定性分析针对既有不确定性又有不确定性两种情况进行了研究。为了评估所提出的方法,我们将其应用于模拟化学反应堆。结果表明,在 offline 训练期间,采样效率显著提高,而在 online 实施中,闭循环稳定性保证也有了显著改善,无论存在与否不确定性。

URL

https://arxiv.org/abs/2305.15602

PDF

https://arxiv.org/pdf/2305.15602.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot