Abstract
Multi-agent reinforcement learning typically suffers from the problem of sample inefficiency, where learning suitable policies involves the use of many data samples. Learning from external demonstrators is a possible solution that mitigates this problem. However, most prior approaches in this area assume the presence of a single demonstrator. Leveraging multiple knowledge sources (i.e., advisors) with expertise in distinct aspects of the environment could substantially speed up learning in complex environments. This paper considers the problem of simultaneously learning from multiple independent advisors in multi-agent reinforcement learning. The approach leverages a two-level Q-learning architecture, and extends this framework from single-agent to multi-agent settings. We provide principled algorithms that incorporate a set of advisors by both evaluating the advisors at each state and subsequently using the advisors to guide action selection. We also provide theoretical convergence and sample complexity guarantees. Experimentally, we validate our approach in three different test-beds and show that our algorithms give better performances than baselines, can effectively integrate the combined expertise of different advisors, and learn to ignore bad advice.
Abstract (translated)
多Agent reinforcement learning 通常面临样本效率问题,即学习合适的政策需要使用许多数据样本。从外部实验者学习是一种可能的解决方案,可以缓解这个问题。然而,该领域以前的大多数方法都假设有一个实验者。利用多个知识源(即顾问)在环境的不同方面具有专业知识可以极大地加快复杂环境中的学习。本文考虑了多Agent Reinforcement Learning中同时从多个独立顾问学习的问题。该方法利用两层次的 Q 学习架构,并将其从单Agent 设置扩展到多Agent 设置。我们提供了原则算法,在每个状态评估顾问并随后使用顾问指导行动选择。我们还提供了理论收敛和样本复杂性保证。实验中,我们在三个不同的测试平台上验证了我们的方法,并表明我们的算法比基准表现更好,可以有效地整合不同顾问的联合专业知识,并学会忽略不好的建议。
URL
https://arxiv.org/abs/2301.11153