Paper Reading AI Learner

Enhancing News Summarization with ELearnFit through Efficient In-Context Learning and Efficient Fine-Tuning

2024-05-04 16:48:05
Che Guan, Andrew Chin, Puya Vahabi

Abstract

With the deluge of information delivered by the daily news cycle, there is a growing need to effectively and efficiently summarize news feeds for quick consumption. We leverage large language models (LLMs), with their advanced learning and generative abilities as compared to conventional language models, to generate concise and coherent summaries for news articles from the XSum dataset. Our paper focuses on two key aspects of LLMs: Efficient in-context Learning (ELearn) and Parameter Efficient Fine-tuning (EFit). Under ELearn, we find that increasing the number of shots in prompts and utilizing simple templates generally improve the quality of summaries. We also find that utilizing relevant examples in few-shot learning for ELearn does not improve model performance. In addition, we studied EFit using different methods and demonstrate that fine-tuning the first layer of LLMs produces better outcomes as compared to fine-tuning other layers or utilizing LoRA. We also find that leveraging more relevant training samples using selective layers does not result in better performance. By combining ELearn and EFit, we create a new model (ELearnFit) that leverages the benefits of both few-shot learning and fine-tuning and produces superior performance to either model alone. We also use ELearnFit to highlight the trade-offs between prompting and fine-tuning, especially for situations where only a limited number of annotated samples are available. Ultimately, our research provides practical techniques to optimize news summarization during the prompting and fine-tuning stages and enhances the synthesis of news articles.

Abstract (translated)

随着每日新闻循环带来的信息流量,越来越需要有效地和高效地概括新闻摘要,以便快速消费。我们利用大型语言模型(LLMs),其与传统语言模型的先进学习和生成能力相比,以生成简洁且连贯的新闻文章摘要。我们的论文重点关注LLMs的两个关键方面:在上下文中的高效学习(ELearn)和参数效率微调(EFit)。在ELearn方面,我们发现,增加提示中的 shot数并使用简单的模板通常会提高摘要的质量。我们还发现,在ELearn中使用相关示例并不会提高模型的性能。此外,我们研究了EFit,并表明,通过微调第一层LLMs,会产生更好的结果, compared to fine-tuning其他层或使用LoRA。我们还发现,通过选择性层利用更相关的训练样本,并不能提高性能。通过结合ELearn和EFit,我们创建了一个新模型(ELearnFit),它利用了两者之间的优势,并产生了优于单独模型的优异性能。我们还使用ELearnFit突出了提示和微调之间的权衡,尤其是在只有有限数量注释样本的情况下的情况。最终,我们的研究为优化新闻摘要的提示和微调阶段提供了实际技术,并提高了新闻文章的合成。

URL

https://arxiv.org/abs/2405.02710

PDF

https://arxiv.org/pdf/2405.02710.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot