Paper Reading AI Learner

A RelEntLess Benchmark for Modelling Graded Relations between Named Entities

2023-05-24 10:41:24
Asahi Ushio, Jose Camacho Collados, Steven Schockaert

Abstract

Relations such as "is influenced by", "is known for" or "is a competitor of" are inherently graded: we can rank entity pairs based on how well they satisfy these relations, but it is hard to draw a line between those pairs that satisfy them and those that do not. Such graded relations play a central role in many applications, yet they are typically not covered by existing Knowledge Graphs. In this paper, we consider the possibility of using Large Language Models (LLMs) to fill this gap. To this end, we introduce a new benchmark, in which entity pairs have to be ranked according to how much they satisfy a given graded relation. The task is formulated as a few-shot ranking problem, where models only have access to a description of the relation and five prototypical instances. We use the proposed benchmark to evaluate state-of-the-art relation embedding strategies as well as several recent LLMs, covering both publicly available LLMs and closed models such as GPT-4. Overall, we find a strong correlation between model size and performance, with smaller Language Models struggling to outperform a naive baseline. The results of the largest Flan-T5 and OPT models are remarkably strong, although a clear gap with human performance remains.

Abstract (translated)

关系如“受到影响”、“著称于”或“是竞争对手”等是 inherently graded 的:我们可以根据它们是否满足这些关系来对实体 pairs 进行排序,但很难在满足这些关系的实体 pairs 和不满足这些关系的实体 pairs 之间划一条线。在许多应用中,这种 graded 关系扮演着关键角色,但它们通常未被现有知识图覆盖。在本文中,我们考虑使用大型语言模型(LLM)来填补这一空缺。为此,我们引入了一个新的基准,在该基准中,实体 pairs 必须根据它们是否满足给定的grading relation 进行排序。任务被设计为一个简单的问题,模型只能访问关系的描述和五个典型的实例。我们使用 proposed 基准来评估最先进的关系嵌入策略以及最近发布的几个LLM,包括公开可用的LLM和如GPT-4这样的闭式模型。总体而言,我们发现模型大小和性能之间存在强烈的相关性,较小的语言模型一直在努力超越简单的基准。最大的 Flan-T5 和 OPT 模型的结果非常显著,尽管与人类表现仍然存在明显的差距。

URL

https://arxiv.org/abs/2305.15002

PDF

https://arxiv.org/pdf/2305.15002.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot