Paper Reading AI Learner

Meta In-Context Learning Makes Large Language Models Better Zero and Few-Shot Relation Extractors

2024-04-27 07:06:39
Guozheng Li, Peng Wang, Jiajun Liu, Yikai Guo, Ke Ji, Ziyu Shang, Zijie Xu

Abstract

Relation extraction (RE) is an important task that aims to identify the relationships between entities in texts. While large language models (LLMs) have revealed remarkable in-context learning (ICL) capability for general zero and few-shot learning, recent studies indicate that current LLMs still struggle with zero and few-shot RE. Previous studies are mainly dedicated to design prompt formats and select good examples for improving ICL-based RE. Although both factors are vital for ICL, if one can fundamentally boost the ICL capability of LLMs in RE, the zero and few-shot RE performance via ICL would be significantly improved. To this end, we introduce \textsc{Micre} (\textbf{M}eta \textbf{I}n-\textbf{C}ontext learning of LLMs for \textbf{R}elation \textbf{E}xtraction), a new meta-training framework for zero and few-shot RE where an LLM is tuned to do ICL on a diverse collection of RE datasets (i.e., learning to learn in context for RE). Through meta-training, the model becomes more effectively to learn a new RE task in context by conditioning on a few training examples with no parameter updates or task-specific templates at inference time, enabling better zero and few-shot task generalization. We experiment \textsc{Micre} on various LLMs with different model scales and 12 public RE datasets, and then evaluate it on unseen RE benchmarks under zero and few-shot settings. \textsc{Micre} delivers comparable or superior performance compared to a range of baselines including supervised fine-tuning and typical in-context learning methods. We find that the gains are particular significant for larger model scales, and using a diverse set of the meta-training RE datasets is key to improvements. Empirically, we show that \textsc{Micre} can transfer the relation semantic knowledge via relation label name during inference on target RE datasets.

Abstract (translated)

关系提取(RE)是识别文本中实体之间关系的重要任务。虽然大型语言模型(LLMs)已经在一般零和少样本学习方面展示了令人瞩目的表现,但最近的研究表明,当前的LLMs在零和少样本RE方面仍然存在困难。以前的研究主要是致力于设计提示格式和选择好的示例来提高基于ICL的RE。尽管这两点对ICL至关重要,但只要一个能根本性地提高LLM在RE中的ICL能力,零和少样本RE通过ICL的性能就会显著提高。因此,我们引入了\textsc{Micre}(LLMs为关系提取的元训练框架,\textbf{M}eta \textbf{I}n-\textbf{C}ontext learning of LLMs for \textbf{R}elation \textbf{E}xtraction),一种新的元训练框架,用于零和少样本RE,该框架使LLM在多样的RE数据集上进行关系提取(RE)时能够进行基于上下文的ICL学习(即在RE中学习上下文)。通过元训练,模型在推理时通过几组训练示例进行ICL学习,无需进行参数更新或任务特定模板,从而实现更好的零和少样本任务泛化。我们在各种LLM模型规模和12个公共RE数据集上进行实验,然后将\textsc{Micre}应用于各种LLM,并在零和少样本设置下评估其性能。与一系列基线相比,\textsc{Micre}在包括监督微调在内的各种方法中具有相似或卓越的性能。我们发现,对于较大的模型规模,收益尤为明显,而使用具有多样性的元训练RE数据集是提高改进的关键。通过实证研究,我们发现\textsc{Micre}可以通过关系标签名称在目标RE数据集上进行关系语义知识转移。

URL

https://arxiv.org/abs/2404.17807

PDF

https://arxiv.org/pdf/2404.17807.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot