Abstract
Robustness and safety are critical for the trustworthy deployment of deep reinforcement learning in real-world decision making applications. In particular, we require algorithms that can guarantee robust, safe performance in the presence of general environment disturbances, while making limited assumptions on the data collection process during training. In this work, we propose a safe reinforcement learning framework with robustness guarantees through the use of an optimal transport cost uncertainty set. We provide an efficient, theoretically supported implementation based on Optimal Transport Perturbations, which can be applied in a completely offline fashion using only data collected in a nominal training environment. We demonstrate the robust, safe performance of our approach on a variety of continuous control tasks with safety constraints in the Real-World Reinforcement Learning Suite.
Abstract (translated)
可靠性和安全性对于在真实世界决策应用中可信地部署深度强化学习至关重要。特别是,我们需要算法能够保障在一般性环境扰动存在的情况下,提供可靠的安全表现,同时只在训练期间有限地假设了数据收集过程。在本工作中,我们提出了一种基于最优传输成本不确定性组合的安全强化学习框架,并提供了高效、理论支持的实现方法,该实现基于最优传输干扰,可以以完全离线的方式仅使用训练环境中收集的数据应用。我们展示了我们的方法在多项连续控制任务中具有可靠的安全表现,这些任务具有安全限制的真实强化学习套件中得到了验证。
URL
https://arxiv.org/abs/2301.13375