Abstract
Self-attention weights and their transformed variants have been the main source of information for analyzing token-to-token interactions in Transformer-based models. But despite their ease of interpretation, these weights are not faithful to the models' decisions as they are only one part of an encoder, and other components in the encoder layer can have considerable impact on information mixing in the output representations. In this work, by expanding the scope of analysis to the whole encoder block, we propose Value Zeroing, a novel context mixing score customized for Transformers that provides us with a deeper understanding of how information is mixed at each encoder layer. We demonstrate the superiority of our context mixing score over other analysis methods through a series of complementary evaluations with different viewpoints based on linguistically informed rationales, probing, and faithfulness analysis.
Abstract (translated)
自注意力权重及其变换变种一直是分析Transformer模型中 token-to-token 交互的主要信息来源。但尽管这些权重易于解释,但它们并不忠实于模型的决策,因为它们只是编码器的一部分,编码器层的其他组件可能对输出表示中信息混合产生相当大的影响。在本研究中,通过将分析范围扩展到整个编码器块,我们提出了价值为零,这是一个为Transformers 定制的新上下文混合得分,为我们更深入地理解每个编码层中信息混合的方式提供了。我们通过基于语言学 informed rationales、试错和忠实度分析等不同观点进行了一系列互补评估,以证明我们的上下文混合得分比其他分析方法优越。
URL
https://arxiv.org/abs/2301.12971