Abstract
In the real world, the strong episode resetting mechanisms that are needed to train agents in simulation are unavailable. The \textit{resetting} assumption limits the potential of reinforcement learning in the real world, as providing resets to an agent usually requires the creation of additional handcrafted mechanisms or human interventions. Recent work aims to train agents (\textit{forward}) with learned resets by constructing a second (\textit{backward}) agent that returns the forward agent to the initial state. We find that the termination and timing of the transitions between these two agents are crucial for algorithm success. With this in mind, we create a new algorithm, Reset Free RL with Intelligently Switching Controller (RISC) which intelligently switches between the two agents based on the agent's confidence in achieving its current goal. Our new method achieves state-of-the-art performance on several challenging environments for reset-free RL.
Abstract (translated)
在现实世界中,需要训练代理在模拟中进行强化学习所需的强大 episode reset 机制是不可用的。对齐假设限制了在现实生活中强化学习的潜力,因为提供给代理的重新开始通常需要创建额外的手工机制或人类干预。最近的工作旨在通过构建一个返回初始状态的第二个代理来训练代理(前向代理),我们发现在这两个代理之间转换的终止和时序对算法的成功至关重要。因此,我们创建了一个名为Reset Free RL with Intelligently Switching Controller (RISC)的新算法,该算法根据代理实现其当前目标的信心智能地切换这两个代理。我们的新方法在无重新开始的情况下取得了对几个具有挑战性的环境的最佳性能。
URL
https://arxiv.org/abs/2405.01684