Abstract
Recent work has shown the promise of creating generalist, transformer-based, policies for language, vision, and sequential decision-making problems. To create such models, we generally require centralized training objectives, data, and compute. It is of interest if we can more flexibly create generalist policies, by merging together multiple, task-specific, individually trained policies. In this work, we take a preliminary step in this direction through merging, or averaging, subsets of Decision Transformers in weight space trained on different MuJoCo locomotion problems, forming multi-task models without centralized training. We also propose that when merging policies, we can obtain better results if all policies start from common, pre-trained initializations, while also co-training on shared auxiliary tasks during problem-specific finetuning. In general, we believe research in this direction can help democratize and distribute the process of which forms generally capable agents.
Abstract (translated)
最近的研究表明,创造适用于语言、视觉和顺序决策问题通用性、Transformer-based的政策具有潜力。要创造这些模型,通常需要集中性的训练目标、数据以及计算。如果我们可以更加灵活地创造通用性政策,通过合并多个任务特定的个体训练政策。在这项工作中,我们迈出了在重量空间合并或平均决策 Transformers 训练不同 MuJoCo 运动问题的部分,形成无集中性训练的多功能模型的第一步。我们还建议,如果所有政策都从 common 预先训练的初始化开始,同时在整个训练过程中共同训练共享的辅助任务,可以在问题特定的微调中获得更好的结果。总的来说,我们相信这一方向的研究可以帮助民主化和分散这些构成一般能力的主体的过程。
URL
https://arxiv.org/abs/2303.07551