Automatic WordNet Construction using Word Sense Induction through Sentence Embeddings

2022-04-07 06:50:37

Dan John Velasco, Axel Alba, Trisha Gail Pelagio, Bryce Anthony Ramirez, Jan Christian Blaise Cruz, Charibeth Cheng

arXiv_CL

Abstract
Abstract (translated)
URL
PDF

Abstract

Language resources such as wordnets remain indispensable tools for different natural language tasks and applications. However, for low-resource languages such as Filipino, existing wordnets are old and outdated, and producing new ones may be slow and costly in terms of time and resources. In this paper, we propose an automatic method for constructing a wordnet from scratch using only an unlabeled corpus and a sentence embeddings-based language model. Using this, we produce FilWordNet, a new wordnet that supplants and improves the outdated Filipino WordNet. We evaluate our automatically-induced senses and synsets by matching them with senses from the Princeton WordNet, as well as comparing the synsets to the old Filipino WordNet. We empirically show that our method can induce existing, as well as potentially new, senses and synsets automatically without the need for human supervision.

Abstract (translated)

URL

https://arxiv.org/abs/2204.03251

PDF

https://arxiv.org/pdf/2204.03251.pdf