Abstract
Relations such as "is influenced by", "is known for" or "is a competitor of" are inherently graded: we can rank entity pairs based on how well they satisfy these relations, but it is hard to draw a line between those pairs that satisfy them and those that do not. Such graded relations play a central role in many applications, yet they are typically not covered by existing Knowledge Graphs. In this paper, we consider the possibility of using Large Language Models (LLMs) to fill this gap. To this end, we introduce a new benchmark, in which entity pairs have to be ranked according to how much they satisfy a given graded relation. The task is formulated as a few-shot ranking problem, where models only have access to a description of the relation and five prototypical instances. We use the proposed benchmark to evaluate state-of-the-art relation embedding strategies as well as several recent LLMs, covering both publicly available LLMs and closed models such as GPT-4. Overall, we find a strong correlation between model size and performance, with smaller Language Models struggling to outperform a naive baseline. The results of the largest Flan-T5 and OPT models are remarkably strong, although a clear gap with human performance remains.
Abstract (translated)
关系如“受到影响”、“著称于”或“是竞争对手”等是 inherently graded 的:我们可以根据它们是否满足这些关系来对实体 pairs 进行排序,但很难在满足这些关系的实体 pairs 和不满足这些关系的实体 pairs 之间划一条线。在许多应用中,这种 graded 关系扮演着关键角色,但它们通常未被现有知识图覆盖。在本文中,我们考虑使用大型语言模型(LLM)来填补这一空缺。为此,我们引入了一个新的基准,在该基准中,实体 pairs 必须根据它们是否满足给定的grading relation 进行排序。任务被设计为一个简单的问题,模型只能访问关系的描述和五个典型的实例。我们使用 proposed 基准来评估最先进的关系嵌入策略以及最近发布的几个LLM,包括公开可用的LLM和如GPT-4这样的闭式模型。总体而言,我们发现模型大小和性能之间存在强烈的相关性,较小的语言模型一直在努力超越简单的基准。最大的 Flan-T5 和 OPT 模型的结果非常显著,尽管与人类表现仍然存在明显的差距。
URL
https://arxiv.org/abs/2305.15002