Abstract
Information extraction (IE) is a fundamental area in natural language processing where prompting large language models (LLMs), even with in-context examples, cannot defeat small LMs tuned on very small IE datasets. We observe that IE tasks, such as named entity recognition and relation extraction, all focus on extracting important information, which can be formalized as a label-to-span matching. In this paper, we propose a novel framework MetaIE to build a small LM as meta-model by learning to extract "important information", i.e., the meta-understanding of IE, so that this meta-model can be adapted to all kind of IE tasks effectively and efficiently. Specifically, MetaIE obtains the small LM via a symbolic distillation from an LLM following the label-to-span scheme. We construct the distillation dataset via sampling sentences from language model pre-training datasets (e.g., OpenWebText in our implementation) and prompting an LLM to identify the typed spans of "important information". We evaluate the meta-model under the few-shot adaptation setting. Extensive results on 13 datasets from 6 IE tasks confirm that MetaIE can offer a better starting point for few-shot tuning on IE datasets and outperform other meta-models from (1) vanilla language model pre-training, (2) multi-IE-task pre-training with human annotations, and (3) single-IE-task symbolic distillation from LLM. Moreover, we provide comprehensive analyses of MetaIE, such as the size of the distillation dataset, the meta-model architecture, and the size of the meta-model.
Abstract (translated)
信息抽取(IE)是自然语言处理领域的一个基本领域,即使在大规模语言模型(LLMs)的监督下,甚至使用带有上下文例子的LLMs,也无法击败在非常小的IE数据集上微调的小型LLM。我们观察到,IE任务(如命名实体识别和关系提取)都关注于提取重要信息,这可以形式化为标签到句子的匹配。在本文中,我们提出了一个名为MetaIE的新框架,通过学习提取“重要信息”——即IE的元理解,将小LLM构建为元模型,以便适应各种IE任务有效地和高效地。具体来说,MetaIE通过从LLM跟随标签到句子的符号蒸馏中获得小LLM,然后通过从语言模型预训练数据(例如,在我们的实现中使用OpenWebText)采样句子并提示LLM识别重要信息的句柄来构建差分数据集。我们在少样本适应设置下评估元模型。从6个IE任务中的13个数据集中获得的大量结果证实,MetaIE可以在IE数据集上提供更好的起点,并在(1)普通语言模型预训练,(2)带人类注释的多IE任务预训练和(3)LLM的单IE任务符号差分方面优于其他元模型。此外,我们提供了对MetaIE的全面分析,包括差分数据集的大小、元模型架构和元模型的大小。
URL
https://arxiv.org/abs/2404.00457