Abstract
With the deluge of information delivered by the daily news cycle, there is a growing need to effectively and efficiently summarize news feeds for quick consumption. We leverage large language models (LLMs), with their advanced learning and generative abilities as compared to conventional language models, to generate concise and coherent summaries for news articles from the XSum dataset. Our paper focuses on two key aspects of LLMs: Efficient in-context Learning (ELearn) and Parameter Efficient Fine-tuning (EFit). Under ELearn, we find that increasing the number of shots in prompts and utilizing simple templates generally improve the quality of summaries. We also find that utilizing relevant examples in few-shot learning for ELearn does not improve model performance. In addition, we studied EFit using different methods and demonstrate that fine-tuning the first layer of LLMs produces better outcomes as compared to fine-tuning other layers or utilizing LoRA. We also find that leveraging more relevant training samples using selective layers does not result in better performance. By combining ELearn and EFit, we create a new model (ELearnFit) that leverages the benefits of both few-shot learning and fine-tuning and produces superior performance to either model alone. We also use ELearnFit to highlight the trade-offs between prompting and fine-tuning, especially for situations where only a limited number of annotated samples are available. Ultimately, our research provides practical techniques to optimize news summarization during the prompting and fine-tuning stages and enhances the synthesis of news articles.
Abstract (translated)
随着每日新闻循环带来的信息流量,越来越需要有效地和高效地概括新闻摘要,以便快速消费。我们利用大型语言模型(LLMs),其与传统语言模型的先进学习和生成能力相比,以生成简洁且连贯的新闻文章摘要。我们的论文重点关注LLMs的两个关键方面:在上下文中的高效学习(ELearn)和参数效率微调(EFit)。在ELearn方面,我们发现,增加提示中的 shot数并使用简单的模板通常会提高摘要的质量。我们还发现,在ELearn中使用相关示例并不会提高模型的性能。此外,我们研究了EFit,并表明,通过微调第一层LLMs,会产生更好的结果, compared to fine-tuning其他层或使用LoRA。我们还发现,通过选择性层利用更相关的训练样本,并不能提高性能。通过结合ELearn和EFit,我们创建了一个新模型(ELearnFit),它利用了两者之间的优势,并产生了优于单独模型的优异性能。我们还使用ELearnFit突出了提示和微调之间的权衡,尤其是在只有有限数量注释样本的情况下的情况。最终,我们的研究为优化新闻摘要的提示和微调阶段提供了实际技术,并提高了新闻文章的合成。
URL
https://arxiv.org/abs/2405.02710