Abstract
Post-training algorithms based on deep reinforcement learning can push the limits of robotic models for specific objectives, such as generalizability, accuracy, and robustness. However, Intervention-requiring Failures (IR Failures) (e.g., a robot spilling water or breaking fragile glass) during real-world exploration happen inevitably, hindering the practical deployment of such a paradigm. To tackle this, we introduce Failure-Aware Offline-to-Online Reinforcement Learning (FARL), a new paradigm minimizing failures during real-world reinforcement learning. We create FailureBench, a benchmark that incorporates common failure scenarios requiring human intervention, and propose an algorithm that integrates a world-model-based safety critic and a recovery policy trained offline to prevent failures during online exploration. Extensive simulation and real-world experiments demonstrate the effectiveness of FARL in significantly reducing IR Failures while improving performance and generalization during online reinforcement learning post-training. FARL reduces IR Failures by 73.1% while elevating performance by 11.3% on average during real-world RL post-training. Videos and code are available at this https URL.
Abstract (translated)
基于深度强化学习的训练后算法可以推动特定目标(如泛化能力、准确性及鲁棒性)下的机器人模型边界。然而,在现实世界中的探索中,不可避免地会出现需要干预的操作失败情况(IR Failure,例如机器人打翻水或打破易碎玻璃),这阻碍了这种范式的实际部署。为此,我们引入了一种新的方法——Failure-Aware Offline-to-Online 强化学习 (FARL),该方法在真实世界中的强化学习探索期间最小化故障。我们创建了一个新基准 FailureBench,它包含了需要人工干预的常见故障场景,并提出了一种结合基于世界模型的安全评估器和离线训练恢复策略的方法,以预防在线探索过程中的故障。广泛的模拟实验与现实世界实验都展示了 FARL 在减少 IR Failures 的同时,在实际强化学习后培训期间提高了性能及泛化的有效性。FARL 将 IR Failure 减少了 73.1%,并且平均提升了 11.3% 的性能,具体是在真实世界的 RL 后训练阶段。相关视频和代码可在以下链接获取:[提供链接]。
URL
https://arxiv.org/abs/2601.07821