Abstract
Training visual reinforcement learning (RL) in practical scenarios presents a significant challenge, $\textit{i.e.,}$ RL agents suffer from low sample efficiency in environments with variations. While various approaches have attempted to alleviate this issue by disentanglement representation learning, these methods usually start learning from scratch without prior knowledge of the world. This paper, in contrast, tries to learn and understand underlying semantic variations from distracting videos via offline-to-online latent distillation and flexible disentanglement constraints. To enable effective cross-domain semantic knowledge transfer, we introduce an interpretable model-based RL framework, dubbed Disentangled World Models (DisWM). Specifically, we pretrain the action-free video prediction model offline with disentanglement regularization to extract semantic knowledge from distracting videos. The disentanglement capability of the pretrained model is then transferred to the world model through latent distillation. For finetuning in the online environment, we exploit the knowledge from the pretrained model and introduce a disentanglement constraint to the world model. During the adaptation phase, the incorporation of actions and rewards from online environment interactions enriches the diversity of the data, which in turn strengthens the disentangled representation learning. Experimental results validate the superiority of our approach on various benchmarks.
Abstract (translated)
在实际场景中训练视觉强化学习(RL)面临重大挑战,即在环境变化的情况下,RL代理的样本效率低下。尽管有许多方法尝试通过解耦表示学习来缓解这一问题,这些方法通常从零开始学习,并不使用世界知识。相比之下,本文试图通过离线到在线潜在蒸馏和灵活的解耦约束从分散视频中学习并理解底层语义变化。为了实现有效的跨域语义知识转移,我们引入了一个可解释的基于模型的RL框架,称为解耦世界模型(DisWM)。具体来说,我们在离线状态下使用带有解耦正则化的无动作视频预测模型进行预训练,以从分散视频中提取语义知识。接着,通过潜在蒸馏将预训练模型的解耦能力转移到世界模型上。在线环境中微调时,我们利用了预训练模型的知识,并向世界模型引入了解耦约束。在适应阶段,结合在线环境交互中的动作和奖励数据丰富了数据多样性,进一步增强了解耦表示学习。实验结果验证了我们的方法在多个基准测试上的优越性。
URL
https://arxiv.org/abs/2503.08751