Paper Reading AI Learner

Make Your LLM Fully Utilize the Context

2024-04-25 17:55:14
Shengnan An, Zexiong Ma, Zeqi Lin, Nanning Zheng, Jian-Guang Lou

Abstract

While many contemporary large language models (LLMs) can process lengthy input, they still struggle to fully utilize information within the long context, known as the lost-in-the-middle challenge. We hypothesize that it stems from insufficient explicit supervision during the long-context training, which fails to emphasize that any position in a long context can hold crucial information. Based on this intuition, our study presents information-intensive (IN2) training, a purely data-driven solution to overcome lost-in-the-middle. Specifically, IN2 training leverages a synthesized long-context question-answer dataset, where the answer requires (1) fine-grained information awareness on a short segment (~128 tokens) within a synthesized long context (4K-32K tokens), and (2) the integration and reasoning of information from two or more short segments. Through applying this information-intensive training on Mistral-7B, we present FILM-7B (FILl-in-the-Middle). To thoroughly assess the ability of FILM-7B for utilizing long contexts, we design three probing tasks that encompass various context styles (document, code, and structured-data context) and information retrieval patterns (forward, backward, and bi-directional retrieval). The probing results demonstrate that FILM-7B can robustly retrieve information from different positions in its 32K context window. Beyond these probing tasks, FILM-7B significantly improves the performance on real-world long-context tasks (e.g., 23.5->26.9 F1 score on NarrativeQA), while maintaining a comparable performance on short-context tasks (e.g., 59.3->59.2 accuracy on MMLU). Github Link: this https URL.

Abstract (translated)

虽然许多当代大型语言模型(LLMs)可以处理长输入,但它们仍然很难在长上下文中完全利用信息,这被称为迷失在中间的挑战。我们假设这源于在长上下文训练期间缺乏明确的监督,这没有强调任何长上下文中的位置都可能持有关键信息。根据这个直觉,我们的研究提出了信息密集型(IN2)训练,这是一种完全数据驱动的解决方案来克服迷失在中间的挑战。具体来说,IN2训练利用合成长上下文问题-答案数据集,其中答案需要(1)在合成长上下文(4K-32K个词)中的短片段(~128个词)进行精细信息意识,以及(2)来自两个或更多短片段的信息整合和推理。通过在Mistral-7B上应用这一信息密集型训练,我们提出了FILM-7B(FILM-在中间)。为了全面评估FILM-7B在利用长上下文方面的能力,我们设计了一个涵盖各种上下文风格(文档、代码和结构化数据)和信息检索模式(前向、后向和双向检索)的三个探针任务。探针结果表明,FILM-7B可以稳健地从其32K个上下文窗口中的不同位置检索信息。除了这些探针任务之外,FILM-7B在现实世界的长上下文任务中的性能显著提高(例如,在NarrativeQA上的23.5->26.9 F1得分),同时它在短上下文任务中的性能与预相当(例如,在MMLU上的59.3->59.2准确率)。Github链接:https://github.com/。

URL

https://arxiv.org/abs/2404.16811

PDF

https://arxiv.org/pdf/2404.16811.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot