Off-the-Grid MARL: a Framework for Dataset Generation with Baselines for Cooperative Offline Multi-Agent Reinforcement Learning

Abstract
Abstract (translated)
URL
PDF

Abstract

Being able to harness the power of large, static datasets for developing autonomous multi-agent systems could unlock enormous value for real-world applications. Many important industrial systems are multi-agent in nature and are difficult to model using bespoke simulators. However, in industry, distributed system processes can often be recorded during operation, and large quantities of demonstrative data can be stored. Offline multi-agent reinforcement learning (MARL) provides a promising paradigm for building effective online controllers from static datasets. However, offline MARL is still in its infancy, and, therefore, lacks standardised benchmarks, baselines and evaluation protocols typically found in more mature subfields of RL. This deficiency makes it difficult for the community to sensibly measure progress. In this work, we aim to fill this gap by releasing \emph{off-the-grid MARL (OG-MARL)}: a framework for generating offline MARL datasets and algorithms. We release an initial set of datasets and baselines for cooperative offline MARL, created using the framework, along with a standardised evaluation protocol. Our datasets provide settings that are characteristic of real-world systems, including complex dynamics, non-stationarity, partial observability, suboptimality and sparse rewards, and are generated from popular online MARL benchmarks. We hope that OG-MARL will serve the community and help steer progress in offline MARL, while also providing an easy entry point for researchers new to the field.

Abstract (translated)

利用大型静态数据集开发自主多agent系统可以释放巨大的实际价值。许多重要的工业系统是多agent的,难以使用专门的模拟器进行建模。然而,在工业中,分布式系统过程可以在运行时记录,并存储大量演示数据。离线多agent reinforcement learning(MARL)提供了一个有前途的模式,从静态数据集构建有效的在线控制器。然而,离线MARL仍然处于婴儿期,因此缺乏标准化基准、基线和应用协议,通常出现在更成熟的RL子领域。这种缺陷使社区难以合理衡量进展。在这项工作中,我们的目标是释放 emph{off-the-grid MARL (OG-MARL)}:一个框架,用于生成离线MARL数据和算法。我们发布了使用框架创建的一组数据和基线,并标准化了评估协议。我们的数据集提供了现实世界系统的特征设置,包括复杂的动态性、非一致性、部分可观测性、最优性和稀疏奖励,是从流行的在线MARL基准生成的。我们希望 OG-MARL将服务于社区,帮助引导离线MARL的进展,同时也为初学者提供一个容易进入的领域。

URL

https://arxiv.org/abs/2302.00521

PDF

https://arxiv.org/pdf/2302.00521.pdf