Semi-Supervised and Unsupervised Sense Annotation via Translations

2021-06-11 15:32:46

Bradley Hauer, Grzegorz Kondrak, Yixing Luan, Arnob Mallik, Lili Mou

arXiv_CL

Abstract
Abstract (translated)
URL
PDF

Abstract

Acquisition of multilingual training data continues to be a challenge in word sense disambiguation (WSD). To address this problem, unsupervised approaches have been developed in recent years that automatically generate sense annotations suitable for training supervised WSD systems. We present three new methods to creating sense-annotated corpora, which leverage translations, parallel corpora, lexical resources, and contextual and synset embeddings. Our semi-supervised method applies machine translation to transfer existing sense annotations to other languages. Our two unsupervised methods use a knowledge-based WSD system to annotate a parallel corpus, and refine the resulting sense annotations by identifying lexical translations. We obtain state-of-the-art results on standard WSD benchmarks.

Abstract (translated)

URL

https://arxiv.org/abs/2106.06462

PDF

https://arxiv.org/pdf/2106.06462.pdf