Paper Reading AI Learner

Multi-Agent Language Models: Advancing Cooperation, Coordination, and Adaptation

2025-06-11 02:12:34
Arjun Vaithilingam Sudhakar

Abstract

Modern Large Language Models (LLMs) exhibit impressive zero-shot and few-shot generalization capabilities across complex natural language tasks, enabling their widespread use as virtual assistants for diverse applications such as translation and summarization. Despite being trained solely on large corpora of text without explicit supervision on author intent, LLMs appear to infer the underlying meaning of textual interactions. This raises a fundamental question: can LLMs model and reason about the intentions of others, i.e., do they possess a form of theory of mind? Understanding other's intentions is crucial for effective collaboration, which underpins human societal success and is essential for cooperative interactions among multiple agents, including humans and autonomous systems. In this work, we investigate the theory of mind in LLMs through the lens of cooperative multi-agent reinforcement learning (MARL), where agents learn to collaborate via repeated interactions, mirroring human social reasoning. Our approach aims to enhance artificial agent's ability to adapt and cooperate with both artificial and human partners. By leveraging LLM-based agents capable of natural language interaction, we move towards creating hybrid human-AI systems that can foster seamless collaboration, with broad implications for the future of human-artificial interaction.

Abstract (translated)

现代大型语言模型(LLMs)在复杂的自然语言任务中展现出了令人印象深刻的零样本和少量样本泛化能力,使其能够广泛用作翻译和摘要等多样化应用的虚拟助手。尽管这些模型仅通过大规模文本语料库进行训练而未明确监督作者意图,它们似乎能够推断出文本交互背后的含义。这引发了这样一个基本问题:LLMs 是否具备推理他人意图的能力,即是否拥有某种形式的心智理论?理解他人的意图对于有效合作至关重要,这是人类社会成功的基础,并且在包括人和自主系统在内的多代理协同互动中也是必不可少的。在这项工作中,我们通过合作型多智能体强化学习(MARL)来探讨LLMs中的心智理论问题,在这种环境中,代理通过重复交互学习如何协作,这反映了人类的社会推理过程。我们的方法旨在增强人工代理适应并与其它人工智能代理和人类伙伴进行合作的能力。通过利用能够进行自然语言互动的基于LLM的代理,我们朝着创建可以促进无缝协同工作的混合人机系统迈进了一步,这对于未来的人类与AI交互具有深远的影响。

URL

https://arxiv.org/abs/2506.09331

PDF

https://arxiv.org/pdf/2506.09331.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot