Paper Reading AI Learner

From Grunts to Grammar: Emergent Language from Cooperative Foraging

2025-05-19 08:57:30
Maytus Piriyajitakonkij, Rujikorn Charakorn, Weicheng Tao, Wei Pan, Mingfei Sun, Cheston Tan, Mengmi Zhang

Abstract

Early cavemen relied on gestures, vocalizations, and simple signals to coordinate, plan, avoid predators, and share resources. Today, humans collaborate using complex languages to achieve remarkable results. What drives this evolution in communication? How does language emerge, adapt, and become vital for teamwork? Understanding the origins of language remains a challenge. A leading hypothesis in linguistics and anthropology posits that language evolved to meet the ecological and social demands of early human cooperation. Language did not arise in isolation, but through shared survival goals. Inspired by this view, we investigate the emergence of language in multi-agent Foraging Games. These environments are designed to reflect the cognitive and ecological constraints believed to have influenced the evolution of communication. Agents operate in a shared grid world with only partial knowledge about other agents and the environment, and must coordinate to complete games like picking up high-value targets or executing temporally ordered actions. Using end-to-end deep reinforcement learning, agents learn both actions and communication strategies from scratch. We find that agents develop communication protocols with hallmark features of natural language: arbitrariness, interchangeability, displacement, cultural transmission, and compositionality. We quantify each property and analyze how different factors, such as population size and temporal dependencies, shape specific aspects of the emergent language. Our framework serves as a platform for studying how language can evolve from partial observability, temporal reasoning, and cooperative goals in embodied multi-agent settings. We will release all data, code, and models publicly.

Abstract (translated)

早期的人类穴居人依靠手势、声音和简单的信号来协调行动、规划任务、避免捕食者以及共享资源。而今天,人类则使用复杂的语言来进行协作并取得显著成果。推动这种沟通演变的动力是什么?语言是如何产生、适应并在团队合作中变得至关重要呢?了解语言的起源仍然是一个挑战。语言学和人类学中的一个主流假设认为,语言是为了满足早期人类合作所面临的生态和社会需求而进化的。语言并非孤立地形成,而是通过共享生存目标共同发展起来的。 受这一观点启发,我们研究了多智能体觅食游戏中语言的出现。这些环境的设计旨在反映被认为是影响沟通进化的心智和生态约束条件。代理人在一个共享的网格世界中运作,仅对其他代理人及周围环境具有部分了解,并且必须通过协调来完成诸如拾取高价值目标或执行时间有序行动等游戏任务。 我们利用端到端深度强化学习方法让智能体从零开始学习动作策略和沟通策略。发现这些代理会发展出具备自然语言特征的通信协议:任意性、互换性、位移性、文化传播以及组合性。我们量化了每种属性,并分析不同因素,例如群体规模和时间依赖性如何影响新兴语言的具体方面。 我们的框架为研究如何在具有部分可观察性和合作目标的身体多智能体环境中进化出沟通提供了一个平台。我们将公开发布所有数据、代码及模型。

URL

https://arxiv.org/abs/2505.12872

PDF

https://arxiv.org/pdf/2505.12872.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot