Recall Them All: Retrieval-Augmented Language Models for Long Object List Extraction from Long Documents

Abstract
Abstract (translated)
URL
PDF

Abstract

Methods for relation extraction from text mostly focus on high precision, at the cost of limited recall. High recall is crucial, though, to populate long lists of object entities that stand in a specific relation with a given subject. Cues for relevant objects can be spread across many passages in long texts. This poses the challenge of extracting long lists from long texts. We present the L3X method which tackles the problem in two stages: (1) recall-oriented generation using a large language model (LLM) with judicious techniques for retrieval augmentation, and (2) precision-oriented scrutinization to validate or prune candidates. Our L3X method outperforms LLM-only generations by a substantial margin.

Abstract (translated)

文本关系提取的方法主要关注高精度，但代价是欠准确。高准确度对填充具有特定关系的对象长列表至关重要。相关对象的提示可以分散在长文本的多个段落中。这提出了从长文本中提取长列表的挑战。我们提出了L3X方法，该方法分为两个阶段来解决这个问题：（1）使用大型语言模型（LLM）进行召回导向的生成，并采用适当的检索增强技术；（2）对候选者进行精确度导向的检查，以验证或修剪。我们的L3X方法在LLM-only生成方面取得了很大的优势。

URL

https://arxiv.org/abs/2405.02732

PDF

https://arxiv.org/pdf/2405.02732.pdf

Recall Them All: Retrieval-Augmented Language Models for Long Object List Extraction from Long Documents

Abstract

Abstract (translated)

URL

PDF Copy

PDF