Paper Reading AI Learner

RetICL: Sequential Retrieval of In-Context Examples with Reinforcement Learning

2023-05-23 20:15:56
Alexander Scarlatos, Andrew Lan

Abstract

Many recent developments in large language models focus on prompting them to perform specific tasks. One effective prompting method is in-context learning, where the model performs a (possibly new) generation/prediction task given one (or more) examples. Past work has shown that the choice of examples can make a large impact on task performance. However, finding good examples is not straightforward since the definition of a representative group of examples can vary greatly depending on the task. While there are many existing methods for selecting in-context examples, they generally score examples independently, ignoring the dependency between them and the order in which they are provided to the large language model. In this work, we propose Retrieval for In-Context Learning (RetICL), a learnable method for modeling and optimally selecting examples sequentially for in-context learning. We frame the problem of sequential example selection as a Markov decision process, design an example retriever model using an LSTM, and train it using proximal policy optimization (PPO). We validate RetICL on math problem solving datasets and show that it outperforms both heuristic and learnable baselines, and achieves state-of-the-art accuracy on the TabMWP dataset. We also use case studies to show that RetICL implicitly learns representations of math problem solving strategies.

Abstract (translated)

许多大型语言模型的最新发展都集中在促使它们执行特定的任务。一种有效的促使方法就是上下文学习,其中模型执行一个(可能全新的)生成/预测任务,给定一个(或多个)示例。过去的研究成果表明,选择示例可以对任务表现产生重大影响。然而,找到好的示例并不是件容易的事,因为对于一个代表性的例子群的定义可以随着任务而有很大的差异。虽然有许多现有方法用于选择上下文示例,但它们通常独立评分示例,忽视了它们与示例向量之间的依赖关系以及将它们提供给大型语言模型的顺序。在本研究中,我们提出了上下文学习检索(RetICL),一种可学习的方法,用于建模和最优地选择上下文学习中Sequentially selected Examples。我们将Sequentially selected Examples的问题视为马尔可夫决策过程,使用LSTM设计了一个示例检索模型,并使用近邻策略优化(PPO)训练它。我们对数学问题求解数据集进行了验证,并表明RetICL比启发式和可学习基准表现更好,在TabMWP数据集上实现了最先进的精度。我们还使用案例研究来证明RetICL隐含地学习数学问题求解策略的表示。

URL

https://arxiv.org/abs/2305.14502

PDF

https://arxiv.org/pdf/2305.14502.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot