Paper Reading AI Learner

Parameter Efficient Diverse Paraphrase Generation Using Sequence-Level Knowledge Distillation

2024-04-19 02:59:09
Lasal Jayawardena, Prasan Yapa

Abstract

Over the past year, the field of Natural Language Generation (NLG) has experienced an exponential surge, largely due to the introduction of Large Language Models (LLMs). These models have exhibited the most effective performance in a range of domains within the Natural Language Processing and Generation domains. However, their application in domain-specific tasks, such as paraphrasing, presents significant challenges. The extensive number of parameters makes them difficult to operate on commercial hardware, and they require substantial time for inference, leading to high costs in a production setting. In this study, we tackle these obstacles by employing LLMs to develop three distinct models for the paraphrasing field, applying a method referred to as sequence-level knowledge distillation. These distilled models are capable of maintaining the quality of paraphrases generated by the LLM. They demonstrate faster inference times and the ability to generate diverse paraphrases of comparable quality. A notable characteristic of these models is their ability to exhibit syntactic diversity while also preserving lexical diversity, features previously uncommon due to existing data quality issues in datasets and not typically observed in neural-based approaches. Human evaluation of our models shows that there is only a 4% drop in performance compared to the LLM teacher model used in the distillation process, despite being 1000 times smaller. This research provides a significant contribution to the NLG field, offering a more efficient and cost-effective solution for paraphrasing tasks.

Abstract (translated)

在过去的一年里,自然语言生成(NLG)领域经历了一次指数级的增长,主要得益于大型语言模型(LLMs)的引入。这些模型在自然语言处理和生成领域内的各种领域都表现出最有效的性能。然而,在领域特定的任务中,如paraphrasing,它们的应用带来了显著的挑战。由于它们具有大量的参数,使得它们在商业硬件上很难操作,并且需要大量的时间进行推理,导致在生产环境中成本高昂。 在这项研究中,我们通过使用LLMs来开发三个不同的模型来解决这些障碍,应用了一种称为序列级知识蒸馏的方法。这些蒸馏模型能够保持由LLM生成的paraphrases的质量。它们还显示了更快的推理速度以及生成具有相似质量的多样paraphrases的能力。这些模型的一个显著的特点是,在展示语义多样性的同时,也保留了词汇多样性,这是由于现有数据质量问题在数据集中很少观察到的特征,并且通常不会在基于神经的方法中观察到。 通过对我们的模型进行人机评估,结果显示与LLM教师模型在蒸馏过程中使用的模型相比,性能只有降低了4%。尽管这个模型是LLM的1000倍 smaller,但这项研究为自然语言生成领域提供了重要的贡献,为paraphrasing任务提供了更高效、更经济的方法。

URL

https://arxiv.org/abs/2404.12596

PDF

https://arxiv.org/pdf/2404.12596.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot