Paper Reading AI Learner

Multi-Agent congestion cost minimization with linear function approximation

2023-01-26 08:45:44
Prashant Trivedi, Nandyala Hemachandra

Abstract

This work considers multiple agents traversing a network from a source node to the goal node. The cost to an agent for traveling a link has a private as well as a congestion component. The agent's objective is to find a path to the goal node with minimum overall cost in a decentralized way. We model this as a fully decentralized multi-agent reinforcement learning problem and propose a novel multi-agent congestion cost minimization (MACCM) algorithm. Our MACCM algorithm uses linear function approximations of transition probabilities and the global cost function. In the absence of a central controller and to preserve privacy, agents communicate the cost function parameters to their neighbors via a time-varying communication network. Moreover, each agent maintains its estimate of the global state-action value, which is updated via a multi-agent extended value iteration (MAEVI) sub-routine. We show that our MACCM algorithm achieves a sub-linear regret. The proof requires the convergence of cost function parameters, the MAEVI algorithm, and analysis of the regret bounds induced by the MAEVI triggering condition for each agent. We implement our algorithm on a two node network with multiple links to validate it. We first identify the optimal policy, the optimal number of agents going to the goal node in each period. We observe that the average regret is close to zero for 2 and 3 agents. The optimal policy captures the trade-off between the minimum cost of staying at a node and the congestion cost of going to the goal node. Our work is a generalization of learning the stochastic shortest path problem.

Abstract (translated)

这项工作考虑了多个agent从源节点到目标节点穿越网络的问题。每个agent前往一个link的成本包括私人成本和拥堵成本。agent的目标是以最小总成本的方式找到通往目标节点的路径,我们将其建模为完全分散化的多agent reinforcement learning问题,并提出了一种新的多agent拥堵成本最小化(MACCM)算法。我们的MACCM算法使用线性函数近似过渡概率和全局成本函数。在没有中央控制器和保护隐私的情况下,agent通过时间变化的沟通网络向邻居发送成本函数参数。此外,每个agent维持其对全局状态-行动价值的主观估计,该估计通过多agent扩展的价值迭代(MAEVI)子程序更新。我们证明了我们的MACCM算法实现了 sub-linear 的 regret。证明需要成本函数参数的收敛、MAEVI算法的收敛和分析每个agent的MAEVI触发条件引起的 regret 边界。我们在一个有两个节点的网络上实现了我们的算法来验证它。我们首先确定最优策略,每个时期前往目标节点的最优agent数量。我们观察到,2和3个agent的平均 regret接近于零。最优策略捕捉了最小成本停留在节点和前往目标节点的拥堵成本之间的权衡。我们的工作是学习随机最短路径问题的扩展。

URL

https://arxiv.org/abs/2301.10993

PDF

https://arxiv.org/pdf/2301.10993.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot