Improving word mover's distance by leveraging self-attention matrix

2022-11-11 14:25:08

Hiroaki Yamagiwa, Sho Yokoi, Hidetoshi Shimodaira

arXiv_CL

Abstract
Abstract (translated)
URL
PDF

Abstract

Measuring the semantic similarity between two sentences is still an important task. The word mover's distance (WMD) computes the similarity via the optimal alignment between the sets of word embeddings. However, WMD does not utilize word order, making it difficult to distinguish sentences with large overlaps of similar words, even if they are semantically very different. Here, we attempt to improve WMD by incorporating the sentence structure represented by BERT's self-attention matrix (SAM). The proposed method is based on the Fused Gromov-Wasserstein distance, which simultaneously considers the similarity of the word embedding and the SAM for calculating the optimal transport between two sentences. Experiments on paraphrase identification and semantic textual similarity show that the proposed method improves WMD and its variants. Our code is available at this https URL.

Abstract (translated)

URL

https://arxiv.org/abs/2211.06229

PDF

https://arxiv.org/pdf/2211.06229.pdf