Paper Reading AI Learner

Multi-Agent Large Language Models for Conversational Task-Solving

2024-10-30 11:38:13
Jonas Becker

Abstract

In an era where single large language models have dominated the landscape of artificial intelligence for years, multi-agent systems arise as new protagonists in conversational task-solving. While previous studies have showcased their potential in reasoning tasks and creative endeavors, an analysis of their limitations concerning the conversational paradigms and the impact of individual agents is missing. It remains unascertained how multi-agent discussions perform across tasks of varying complexity and how the structure of these conversations influences the process. To fill that gap, this work systematically evaluates multi-agent systems across various discussion paradigms, assessing their strengths and weaknesses in both generative tasks and question-answering tasks. Alongside the experiments, I propose a taxonomy of 20 multi-agent research studies from 2022 to 2024, followed by the introduction of a framework for deploying multi-agent LLMs in conversational task-solving. I demonstrate that while multi-agent systems excel in complex reasoning tasks, outperforming a single model by leveraging expert personas, they fail on basic tasks. Concretely, I identify three challenges that arise: 1) While longer discussions enhance reasoning, agents fail to maintain conformity to strict task requirements, which leads to problem drift, making shorter conversations more effective for basic tasks. 2) Prolonged discussions risk alignment collapse, raising new safety concerns for these systems. 3) I showcase discussion monopolization through long generations, posing the problem of fairness in decision-making for tasks like summarization. This work uncovers both the potential and challenges that arise with multi-agent interaction and varying conversational paradigms, providing insights into how future research could improve the efficiency, performance, and safety of multi-agent LLMs.

Abstract (translated)

在一个由单一大型语言模型长期主导的人工智能时代,多代理系统作为新的主角出现在对话任务解决中。尽管之前的研究所展示了它们在推理任务和创造性努力中的潜力,但对于这些系统的对话范式局限性以及单个代理的影响的分析仍然缺失。目前还不清楚多代理讨论如何处理不同复杂度的任务,以及这种对话结构如何影响过程。为了填补这一空白,本研究系统地评估了多代理系统在各种讨论范式下的表现,评估它们在生成任务和问答任务中的优缺点。在此基础上,我提出了一个2022年至2024年间20项多代理研究的分类法,并引入了一个部署多代理LLM(大型语言模型)于对话任务解决的框架。我展示了尽管多代理系统在复杂推理任务中表现出色,通过利用专家角色超越单个模型的表现,但它们却在基本任务上失败了。具体来说,我识别出了三个挑战:1) 虽然更长的讨论能够增强推理能力,但代理无法保持对严格任务要求的一致性,这导致了问题偏离,使得简短对话对于基础任务更加有效。2) 延长的讨论有引发一致性崩溃的风险,为这些系统带来了新的安全顾虑。3) 我展示了通过长时间生成而导致的讨论垄断现象,这给如总结等任务中的公平决策提出了问题。这项工作揭示了多代理互动和不同对话范式带来的潜力与挑战,提供了对未来研究如何提升多代理LLM效率、性能及安全性的重要见解。

URL

https://arxiv.org/abs/2410.22932

PDF

https://arxiv.org/pdf/2410.22932.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot