Paper Reading AI Learner

Learning Roles with Emergent Social Value Orientations

2023-01-31 17:54:09
Wenhao Li, Xiangfeng Wang, Bo Jin, Jingyi Lu, Hongyuan Zha

Abstract

Social dilemmas can be considered situations where individual rationality leads to collective irrationality. The multi-agent reinforcement learning community has leveraged ideas from social science, such as social value orientations (SVO), to solve social dilemmas in complex cooperative tasks. In this paper, by first introducing the typical "division of labor or roles" mechanism in human society, we provide a promising solution for intertemporal social dilemmas (ISD) with SVOs. A novel learning framework, called Learning Roles with Emergent SVOs (RESVO), is proposed to transform the learning of roles into the social value orientation emergence, which is symmetrically solved by endowing agents with altruism to share rewards with other agents. An SVO-based role embedding space is then constructed by individual conditioning policies on roles with a novel rank regularizer and mutual information maximizer. Experiments show that RESVO achieves a stable division of labor and cooperation in ISDs with different complexity.

Abstract (translated)

社会困境可以被视为个体理性导致集体非理性的情况。多Agent reinforcement learning 社区利用社会科学的思想,如社会价值定向(SVO),在复杂的合作任务中解决社会困境。在本文中,我们首先介绍了人类社会中的典型的“分工或角色”机制,从而提供了解决 intertemporal 社会困境(ISD)的有前途的解决方案。提出了一种新学习框架,称为“学习角色并出现社会价值定向(RESVO)”,它将角色的学习转化为社会价值定向的出现,通过赋予 agents 利他主义,使其与其他agent 分享奖励,对称地解决这个问题。基于 SVO 的角色嵌入空间是通过对个人 conditioning 政策,使用新的排名 Regularizer 和互信息最大化器,对角色进行个人化条件来实现的。实验表明,RESVO 在 ISD 中实现稳定的分工和合作。

URL

https://arxiv.org/abs/2301.13812

PDF

https://arxiv.org/pdf/2301.13812.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot