Abstract
We inspect the multi-head self-attention in Transformer NMT encoders for three source languages, looking for patterns that could have a syntactic interpretation. In many of the attention heads, we frequently find sequences of consecutive states attending to the same position, which resemble syntactic phrases. We propose a transparent deterministic method of quantifying the amount of syntactic information present in the self-attentions, based on automatically building and evaluating phrase-structure trees from the phrase-like sequences. We compare the resulting trees to existing constituency treebanks, both manually and by computing precision and recall.
Abstract (translated)
我们检查了三种源语言的Transformer NMT编码器中的多头自我关注,寻找可能具有句法解释的模式。在许多注意事项中,我们经常发现连续的状态序列与同一位置有关,这类似于句法短语。我们提出了一种透明的确定性方法,通过从类短语序列中自动构建和评估短语结构树,量化自我关注中的句法信息量。我们将得到的树与现有选区树链接进行比较,包括手动、计算精度和召回。
URL
https://arxiv.org/abs/1906.01958