Abstract
Text search based on lexical matching of keywords is not satisfactory due to polysemous and synonymous words. Semantic search that exploits word meanings, in general, improves search performance. In this paper, we survey WordNet-based information retrieval systems, which employ a word sense disambiguation method to process queries and documents. The problem is that in many cases a word has more than one possible direct sense, and picking only one of them may give a wrong sense for the word. Moreover, the previous systems use only word forms to represent word senses and their hypernyms. We propose a novel approach that uses the most specific common hypernym of the remaining undisambiguated multi-senses of a word, as well as combined WordNet features to represent word meanings. Experiments on a benchmark dataset show that, in terms of the MAP measure, our search engine is 17.7% better than the lexical search, and at least 9.4% better than all surveyed search systems using WordNet. Keywords Ontology, word sense disambiguation, semantic annotation, semantic search.
Abstract (translated)
由于多义词和同义词,基于关键词的词汇匹配的文本搜索不令人满意。通常,利用单词含义的语义搜索可以提高搜索性能。在本文中,我们调查了基于WordNet的信息检索系统,该系统采用词义消歧方法来处理查询和文档。问题在于,在许多情况下,一个单词具有多个可能的直接意义,并且只选择其中一个单词可能会对该单词产生错误的意义。此外,先前的系统仅使用单词形式来表示单词意义及其上位词。我们提出了一种新颖的方法,它使用一个单词的剩余未分配的多义的最具体的常见上位词,以及组合的WordNet特征来表示单词含义。基准数据集上的实验表明,就MAP测量而言,我们的搜索引擎比词汇搜索提高了17.7%,并且比使用WordNet的所有调查搜索系统至少高出9.4%。 关键词本体论,词义消歧,语义标注,语义搜索。
URL
https://arxiv.org/abs/1807.05574