Cross-Domain Neural Entity Linking

Abstract
Abstract (translated)
URL
PDF

Abstract

Entity Linking is the task of matching a mention to an entity in a given knowledge base (KB). It contributes to annotating a massive amount of documents existing on the Web to harness new facts about their matched entities. However, existing Entity Linking systems focus on developing models that are typically domain-dependent and robust only to a particular knowledge base on which they have been trained. The performance is not as adequate when being evaluated on documents and knowledge bases from different domains. Approaches based on pre-trained language models, such as Wu et al. (2020), attempt to solve the problem using a zero-shot setup, illustrating some potential when evaluated on a general-domain KB. Nevertheless, the performance is not equivalent when evaluated on a domain-specific KB. To allow for more accurate Entity Linking across different domains, we propose our framework: Cross-Domain Neural Entity Linking (CDNEL). Our objective is to have a single system that enables simultaneous linking to both the general-domain KB and the domain-specific KB. CDNEL works by learning a joint representation space for these knowledge bases from different domains. It is evaluated using the external Entity Linking dataset (Zeshel) constructed by Logeswaran et al. (2019) and the Reddit dataset collected by Botzer et al. (2021), to compare our proposed method with the state-of-the-art results. The proposed framework uses different types of datasets for fine-tuning, resulting in different model variants of CDNEL. When evaluated on four domains included in the Zeshel dataset, these variants achieve an average precision gain of 9%.

Abstract (translated)

URL

https://arxiv.org/abs/2210.15616

PDF

https://arxiv.org/pdf/2210.15616.pdf