R4: Reinforced Retriever-Reorder-Responder for Retrieval-Augmented Large Language Models

Abstract
Abstract (translated)
URL
PDF

Abstract

Retrieval-augmented large language models (LLMs) leverage relevant content retrieved by information retrieval systems to generate correct responses, aiming to alleviate the hallucination problem. However, existing retriever-responder methods typically append relevant documents to the prompt of LLMs to perform text generation tasks without considering the interaction of fine-grained structural semantics between the retrieved documents and the LLMs. This issue is particularly important for accurate response generation as LLMs tend to ``lose in the middle'' when dealing with input prompts augmented with lengthy documents. In this work, we propose a new pipeline named ``Reinforced Retriever-Reorder-Responder'' (R$^4$) to learn document orderings for retrieval-augmented LLMs, thereby further enhancing their generation abilities while the large numbers of parameters of LLMs remain frozen. The reordering learning process is divided into two steps according to the quality of the generated responses: document order adjustment and document representation enhancement. Specifically, document order adjustment aims to organize retrieved document orderings into beginning, middle, and end positions based on graph attention learning, which maximizes the reinforced reward of response quality. Document representation enhancement further refines the representations of retrieved documents for responses of poor quality via document-level gradient adversarial learning. Extensive experiments demonstrate that our proposed pipeline achieves better factual question-answering performance on knowledge-intensive tasks compared to strong baselines across various public datasets. The source codes and trained models will be released upon paper acceptance.

Abstract (translated)

检索增强的大型语言模型（LLMs）利用信息检索系统检索的相关内容来生成正确的答案，旨在减轻混杂问题。然而，现有的检索响应方法通常在LLM的提示中附加相关文档进行文本生成任务，而没有考虑检索到的文档与LLM之间细粒度语义结构的交互。这个问题在准确回答问题方面尤为重要，因为LLM在处理带有长文档的输入提示时往往会出现“在中途迷失”的情况。在本文中，我们提出了一个名为“强化检索-排序-回答者”（R$^4$）的新管道来学习检索增强LLM的文档顺序，从而在保持LLM的大参数的同时进一步增强其生成能力。排序学习过程根据生成的响应质量分为两个步骤：文档顺序调整和文档表示增强。具体来说，文档顺序调整旨在根据图注意力学习将检索到的文档顺序组织为开始、中间和结束位置，从而最大化响应质量的强化奖励。文档表示增强通过文档级的梯度 adversarial 学习进一步优化了用于低质量响应的文档表示。大量实验证明，与各种公共数据集上的强大基线相比，我们提出的管道在知识密集型任务上的事实问题回答表现更好。源代码和训练好的模型将在论文接受后发布。

URL

https://arxiv.org/abs/2405.02659

PDF

https://arxiv.org/pdf/2405.02659.pdf

R4: Reinforced Retriever-Reorder-Responder for Retrieval-Augmented Large Language Models

Abstract

Abstract (translated)

URL

PDF Copy

PDF