Paper Reading AI Learner

A Single Online Agent Can Efficiently Learn Mean Field Games

2024-05-05 16:38:04
Chenyu Zhang, Xu Chen, Xuan Di

Abstract

Mean field games (MFGs) are a promising framework for modeling the behavior of large-population systems. However, solving MFGs can be challenging due to the coupling of forward population evolution and backward agent dynamics. Typically, obtaining mean field Nash equilibria (MFNE) involves an iterative approach where the forward and backward processes are solved alternately, known as fixed-point iteration (FPI). This method requires fully observed population propagation and agent dynamics over the entire spatial domain, which could be impractical in some real-world scenarios. To overcome this limitation, this paper introduces a novel online single-agent model-free learning scheme, which enables a single agent to learn MFNE using online samples, without prior knowledge of the state-action space, reward function, or transition dynamics. Specifically, the agent updates its policy through the value function (Q), while simultaneously evaluating the mean field state (M), using the same batch of observations. We develop two variants of this learning scheme: off-policy and on-policy QM iteration. We prove that they efficiently approximate FPI, and a sample complexity guarantee is provided. The efficacy of our methods is confirmed by numerical experiments.

Abstract (translated)

平均场游戏(MFGs)是一种有前景的建模大型人群系统行为的方法。然而,解决MFGs可能会具有挑战性,因为前向种群进化与后向智能体动态的耦合。通常,获得平均场纳什均衡(MFNE)需要一种交替求解前向和后向过程的方法,称为固定点迭代(FPI)。这种方法需要完全观察到的种群传播和智能体动态在整个空间域内,这在某些现实场景中可能是不切实际的。为了克服这一局限,本文引入了一种新颖的在线单智能体无学习方案,使得单个智能体能够使用在线样本学习MFNE,而无需先验知识,即使不知道状态-动作空间,奖励函数或状态转移动态。具体来说,智能体通过价值函数(Q)更新其策略,同时使用同一批观察值评估平均场状态(M)。我们开发了两种学习方案:离散和连续QM迭代。我们证明了它们有效地近似FPI,并为样本复杂性提供了保证。通过数值实验,证实了我们的方法的实效性。

URL

https://arxiv.org/abs/2405.03718

PDF

https://arxiv.org/pdf/2405.03718.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot