Abstract
Multi-agent systems (MAS) need to adaptively cope with dynamic environments, changing agent populations, and diverse tasks. However, most of the multi-agent systems cannot easily handle them, due to the complexity of the state and task space. The social impact theory regards the complex influencing factors as forces acting on an agent, emanating from the environment, other agents, and the agent's intrinsic motivation, referring to the social force. Inspired by this concept, we propose a novel gradient-based state representation for multi-agent reinforcement learning. To non-trivially model the social forces, we further introduce a data-driven method, where we employ denoising score matching to learn the social gradient fields (SocialGFs) from offline samples, e.g., the attractive or repulsive outcomes of each force. During interactions, the agents take actions based on the multi-dimensional gradients to maximize their own rewards. In practice, we integrate SocialGFs into the widely used multi-agent reinforcement learning algorithms, e.g., MAPPO. The empirical results reveal that SocialGFs offer four advantages for multi-agent systems: 1) they can be learned without requiring online interaction, 2) they demonstrate transferability across diverse tasks, 3) they facilitate credit assignment in challenging reward settings, and 4) they are scalable with the increasing number of agents.
Abstract (translated)
多智能体系统(MAS)需要适应性地应对动态环境、变化的人工智能体种和多样化的任务。然而,大多数MAS无法轻松处理这些复杂状态和任务空间。社会影响理论将复杂的影响力因素视为作用于智能体、来自环境、其他智能体以及智能体内在动机的力量,即社会力。受到这个概念的启发,我们提出了一个新颖的基于梯度的多智能体强化学习状态表示。为了非平凡地建模社会力,我们进一步引入了一种数据驱动的方法,其中我们使用去噪评分匹配来从离线样本中学习社会梯度场(SocialGFs),例如每个力的吸引或排斥后果。在交互过程中,智能体根据多维梯度采取行动,以最大化自己的奖励。在实践中,我们将SocialGFs集成到广泛使用的多智能体强化学习算法中,如MAPPO。实证结果表明,SocialGFs对多智能体系统具有以下四个优点:1)无需在线交互即可学习,2)展示了跨多样化任务的传递性,3)有助于在具有挑战性的奖励设置中进行信道分配,4)随着智能体数量的增加,具有可扩展性。
URL
https://arxiv.org/abs/2405.01839