Abstract
Social dilemmas can be considered situations where individual rationality leads to collective irrationality. The multi-agent reinforcement learning community has leveraged ideas from social science, such as social value orientations (SVO), to solve social dilemmas in complex cooperative tasks. In this paper, by first introducing the typical "division of labor or roles" mechanism in human society, we provide a promising solution for intertemporal social dilemmas (ISD) with SVOs. A novel learning framework, called Learning Roles with Emergent SVOs (RESVO), is proposed to transform the learning of roles into the social value orientation emergence, which is symmetrically solved by endowing agents with altruism to share rewards with other agents. An SVO-based role embedding space is then constructed by individual conditioning policies on roles with a novel rank regularizer and mutual information maximizer. Experiments show that RESVO achieves a stable division of labor and cooperation in ISDs with different complexity.
Abstract (translated)
社会困境可以被视为个体理性导致集体非理性的情况。多Agent reinforcement learning 社区利用社会科学的思想,如社会价值定向(SVO),在复杂的合作任务中解决社会困境。在本文中,我们首先介绍了人类社会中的典型的“分工或角色”机制,从而提供了解决 intertemporal 社会困境(ISD)的有前途的解决方案。提出了一种新学习框架,称为“学习角色并出现社会价值定向(RESVO)”,它将角色的学习转化为社会价值定向的出现,通过赋予 agents 利他主义,使其与其他agent 分享奖励,对称地解决这个问题。基于 SVO 的角色嵌入空间是通过对个人 conditioning 政策,使用新的排名 Regularizer 和互信息最大化器,对角色进行个人化条件来实现的。实验表明,RESVO 在 ISD 中实现稳定的分工和合作。
URL
https://arxiv.org/abs/2301.13812