Control invariant set enhanced safe reinforcement learning: improved sampling efficiency, guaranteed stability and robustness

Abstract
Abstract (translated)
URL
PDF

Abstract

Reinforcement learning (RL) is an area of significant research interest, and safe RL in particular is attracting attention due to its ability to handle safety-driven constraints that are crucial for real-world applications. This work proposes a novel approach to RL training, called control invariant set (CIS) enhanced RL, which leverages the advantages of utilizing the explicit form of CIS to improve stability guarantees and sampling efficiency. Furthermore, the robustness of the proposed approach is investigated in the presence of uncertainty. The approach consists of two learning stages: offline and online. In the offline stage, CIS is incorporated into the reward design, initial state sampling, and state reset procedures. This incorporation of CIS facilitates improved sampling efficiency during the offline training process. In the online stage, RL is retrained whenever the predicted next step state is outside of the CIS, which serves as a stability criterion, by introducing a Safety Supervisor to examine the safety of the action and make necessary corrections. The stability analysis is conducted for both cases, with and without uncertainty. To evaluate the proposed approach, we apply it to a simulated chemical reactor. The results show a significant improvement in sampling efficiency during offline training and closed-loop stability guarantee in the online implementation, with and without uncertainty.

Abstract (translated)

强化学习(RL)是一个具有重要研究兴趣的领域，特别是安全RL备受关注，因为它能够处理对于实际应用程序至关重要的安全驱动约束。这项工作提出了一种新颖的RL训练方法，称为控制不变集(CIS)增强RL，该方法利用CIS的显式形式来提高稳定性保证和采样效率。此外，在存在不确定性的情况下，该方法研究了 proposed 方法的鲁棒性。方法分为两个学习阶段： offline 和 online。在 offline 阶段，CIS 被嵌入到奖励设计、初始状态采样和状态重置程序中。这种方法的嵌入在 offline 训练过程中促进了更好的采样效率。在 online 阶段，每次预测的下一个状态都超出了 CIS，它作为稳定性准则，引入了安全主管来检查行动的安全性并进行必要的纠正。稳定性分析针对既有不确定性又有不确定性两种情况进行了研究。为了评估所提出的方法，我们将其应用于模拟化学反应堆。结果表明，在 offline 训练期间，采样效率显著提高，而在 online 实施中，闭循环稳定性保证也有了显著改善，无论存在与否不确定性。

URL

https://arxiv.org/abs/2305.15602

PDF

https://arxiv.org/pdf/2305.15602.pdf