Abstract
Variants of the BERT architecture specialised for producing full-sentence representations often achieve better performance on downstream tasks than sentence embeddings extracted from vanilla BERT. However, there is still little understanding of what properties of inputs determine the properties of such representations. In this study, we construct several sets of sentences with pre-defined lexical and syntactic structures and show that SOTA sentence transformers have a strong nominal-participant-set bias: cosine similarities between pairs of sentences are more strongly determined by the overlap in the set of their noun participants than by having the same predicates, lengthy nominal modifiers, or adjuncts. At the same time, the precise syntactic-thematic functions of the participants are largely irrelevant.
Abstract (translated)
BERT架构的特殊版本专门用于生成完整句子表示,往往能够在后续任务中表现得更好,而从普通BERT中提取的句子嵌入子集则表现更佳。然而,仍然不清楚输入特征哪些性质决定了这种表示的特征。在本研究中,我们构建了一系列具有预定义词汇和语法结构的句子集合,并表明最先进的句子转换器存在明显的名词参与者集合偏见:两个句子之间的相似度比相同主题词、长名词修饰语或附加词之间的相似度更加依赖于它们的名词参与者集合的重叠程度。同时,参与者的精确语法-主题功能在很大程度上无关紧要。
URL
https://arxiv.org/abs/2301.13039