Abstract
Large language models (LLMs) exhibit remarkable capabilities on not just language tasks, but also various tasks that are not linguistic in nature, such as logical reasoning and social inference. In the human brain, neuroscience has identified a core language system that selectively and causally supports language processing. We here ask whether similar specialization for language emerges in LLMs. We identify language-selective units within 18 popular LLMs, using the same localization approach that is used in neuroscience. We then establish the causal role of these units by demonstrating that ablating LLM language-selective units -- but not random units -- leads to drastic deficits in language tasks. Correspondingly, language-selective LLM units are more aligned to brain recordings from the human language system than random units. Finally, we investigate whether our localization method extends to other cognitive domains: while we find specialized networks in some LLMs for reasoning and social capabilities, there are substantial differences among models. These findings provide functional and causal evidence for specialization in large language models, and highlight parallels with the functional organization in the brain.
Abstract (translated)
大型语言模型(LLMs)不仅在语言任务上表现出色,还在诸如逻辑推理和社会推断等非语言性质的任务中展现出显著的能力。在人类大脑中,神经科学已经识别出一个核心的语言系统,该系统专门且因果性地支持语言处理。我们在此探讨类似的语言专化是否也在大型语言模型(LLMs)中出现。通过使用与神经科学研究相同的位置定位方法,我们在18个流行的LLMs中确定了特定于语言的单元。接着,通过展示消除LLM特定于语言的单元——而不是随机选择的单元——会导致语言任务上的严重缺陷,我们确立了这些单元的因果作用。相应地,特定于语言的LLM单元比随机单元更与人类语言系统的大脑记录对齐。最后,我们调查了我们的定位方法是否适用于其他认知领域:虽然我们在一些LLMs中发现了专门用于推理和社会能力的网络,但各模型之间存在显著差异。这些发现为大型语言模型中的专化提供了功能性和因果性证据,并突显出与大脑功能组织之间的相似之处。
URL
https://arxiv.org/abs/2411.02280