Abstract
Traditional information retrieval systems represent documents and queries by keyword sets. However, the content of a document or a query is mainly defined by both keywords and named entities occurring in it. Named entities have ontological features, namely, their aliases, classes, and identifiers, which are hidden from their textual appearance. Besides, the meaning of a query may imply latent named entities that are related to the apparent ones in the query. We propose an ontology-based generalized vector space model to semantic text search. It exploits ontological features of named entities and their latently related ones to reveal the semantics of documents and queries. We also propose a framework to combine different ontologies to take their complementary advantages for semantic annotation and searching. Experiments on a benchmark dataset show better search quality of our model to other ones.
Abstract (translated)
传统的信息检索系统通过关键字集表示文档和查询。但是,文档或查询的内容主要由关键字和在其中出现的命名实体来定义。命名实体具有本体特征,即它们的别名,类和标识符,它们的文本外观是隐藏的。此外,查询的含义可能意味着潜在的命名实体与查询中明显的实体相关。我们提出了一种基于本体的广义向量空间模型来进行语义文本搜索。它利用命名实体的本体特征及其潜在相关实体来揭示文档和查询的语义。我们还提出了一个框架,结合不同的本体,为语义标注和搜索提供互补优势。基准数据集上的实验表明我们的模型对其他模型的搜索质量更好。
URL
https://arxiv.org/abs/1807.05579