Abstract
Mean field games (MFGs) are a promising framework for modeling the behavior of large-population systems. However, solving MFGs can be challenging due to the coupling of forward population evolution and backward agent dynamics. Typically, obtaining mean field Nash equilibria (MFNE) involves an iterative approach where the forward and backward processes are solved alternately, known as fixed-point iteration (FPI). This method requires fully observed population propagation and agent dynamics over the entire spatial domain, which could be impractical in some real-world scenarios. To overcome this limitation, this paper introduces a novel online single-agent model-free learning scheme, which enables a single agent to learn MFNE using online samples, without prior knowledge of the state-action space, reward function, or transition dynamics. Specifically, the agent updates its policy through the value function (Q), while simultaneously evaluating the mean field state (M), using the same batch of observations. We develop two variants of this learning scheme: off-policy and on-policy QM iteration. We prove that they efficiently approximate FPI, and a sample complexity guarantee is provided. The efficacy of our methods is confirmed by numerical experiments.
Abstract (translated)
平均场游戏(MFGs)是一种有前景的建模大型人群系统行为的方法。然而,解决MFGs可能会具有挑战性,因为前向种群进化与后向智能体动态的耦合。通常,获得平均场纳什均衡(MFNE)需要一种交替求解前向和后向过程的方法,称为固定点迭代(FPI)。这种方法需要完全观察到的种群传播和智能体动态在整个空间域内,这在某些现实场景中可能是不切实际的。为了克服这一局限,本文引入了一种新颖的在线单智能体无学习方案,使得单个智能体能够使用在线样本学习MFNE,而无需先验知识,即使不知道状态-动作空间,奖励函数或状态转移动态。具体来说,智能体通过价值函数(Q)更新其策略,同时使用同一批观察值评估平均场状态(M)。我们开发了两种学习方案:离散和连续QM迭代。我们证明了它们有效地近似FPI,并为样本复杂性提供了保证。通过数值实验,证实了我们的方法的实效性。
URL
https://arxiv.org/abs/2405.03718