Abstract
We present a novel approach to detecting noun abstraction within a large language model (LLM). Starting from a psychologically motivated set of noun pairs in taxonomic relationships, we instantiate surface patterns indicating hypernymy and analyze the attention matrices produced by BERT. We compare the results to two sets of counterfactuals and show that we can detect hypernymy in the abstraction mechanism, which cannot solely be related to the distributional similarity of noun pairs. Our findings are a first step towards the explainability of conceptual abstraction in LLMs.
Abstract (translated)
我们提出了一个在大型语言模型(LLM)中检测名词抽象的新方法。从心理上动机的一组名词对中开始,我们实例化表明超类和分析由BERT产生的注意矩阵。我们将结果与两组反事实进行比较,并表明我们可以在抽象机制中检测超类,而不仅仅是名词对之间的分布相似性。我们的研究结果是LLM中概念抽象解释的第一步。
URL
https://arxiv.org/abs/2404.15848