RetICL: Sequential Retrieval of In-Context Examples with Reinforcement Learning

Abstract
Abstract (translated)
URL
PDF

Abstract

Many recent developments in large language models focus on prompting them to perform specific tasks. One effective prompting method is in-context learning, where the model performs a (possibly new) generation/prediction task given one (or more) examples. Past work has shown that the choice of examples can make a large impact on task performance. However, finding good examples is not straightforward since the definition of a representative group of examples can vary greatly depending on the task. While there are many existing methods for selecting in-context examples, they generally score examples independently, ignoring the dependency between them and the order in which they are provided to the large language model. In this work, we propose Retrieval for In-Context Learning (RetICL), a learnable method for modeling and optimally selecting examples sequentially for in-context learning. We frame the problem of sequential example selection as a Markov decision process, design an example retriever model using an LSTM, and train it using proximal policy optimization (PPO). We validate RetICL on math problem solving datasets and show that it outperforms both heuristic and learnable baselines, and achieves state-of-the-art accuracy on the TabMWP dataset. We also use case studies to show that RetICL implicitly learns representations of math problem solving strategies.

Abstract (translated)

许多大型语言模型的最新发展都集中在促使它们执行特定的任务。一种有效的促使方法就是上下文学习，其中模型执行一个(可能全新的)生成/预测任务，给定一个(或多个)示例。过去的研究成果表明，选择示例可以对任务表现产生重大影响。然而，找到好的示例并不是件容易的事，因为对于一个代表性的例子群的定义可以随着任务而有很大的差异。虽然有许多现有方法用于选择上下文示例，但它们通常独立评分示例，忽视了它们与示例向量之间的依赖关系以及将它们提供给大型语言模型的顺序。在本研究中，我们提出了上下文学习检索(RetICL)，一种可学习的方法，用于建模和最优地选择上下文学习中Sequentially selected Examples。我们将Sequentially selected Examples的问题视为马尔可夫决策过程，使用LSTM设计了一个示例检索模型，并使用近邻策略优化(PPO)训练它。我们对数学问题求解数据集进行了验证，并表明RetICL比启发式和可学习基准表现更好，在TabMWP数据集上实现了最先进的精度。我们还使用案例研究来证明RetICL隐含地学习数学问题求解策略的表示。

URL

https://arxiv.org/abs/2305.14502

PDF

https://arxiv.org/pdf/2305.14502.pdf