Abstract
Over the past year, the field of Natural Language Generation (NLG) has experienced an exponential surge, largely due to the introduction of Large Language Models (LLMs). These models have exhibited the most effective performance in a range of domains within the Natural Language Processing and Generation domains. However, their application in domain-specific tasks, such as paraphrasing, presents significant challenges. The extensive number of parameters makes them difficult to operate on commercial hardware, and they require substantial time for inference, leading to high costs in a production setting. In this study, we tackle these obstacles by employing LLMs to develop three distinct models for the paraphrasing field, applying a method referred to as sequence-level knowledge distillation. These distilled models are capable of maintaining the quality of paraphrases generated by the LLM. They demonstrate faster inference times and the ability to generate diverse paraphrases of comparable quality. A notable characteristic of these models is their ability to exhibit syntactic diversity while also preserving lexical diversity, features previously uncommon due to existing data quality issues in datasets and not typically observed in neural-based approaches. Human evaluation of our models shows that there is only a 4% drop in performance compared to the LLM teacher model used in the distillation process, despite being 1000 times smaller. This research provides a significant contribution to the NLG field, offering a more efficient and cost-effective solution for paraphrasing tasks.
Abstract (translated)
在过去的一年里,自然语言生成(NLG)领域经历了一次指数级的增长,主要得益于大型语言模型(LLMs)的引入。这些模型在自然语言处理和生成领域内的各种领域都表现出最有效的性能。然而,在领域特定的任务中,如paraphrasing,它们的应用带来了显著的挑战。由于它们具有大量的参数,使得它们在商业硬件上很难操作,并且需要大量的时间进行推理,导致在生产环境中成本高昂。 在这项研究中,我们通过使用LLMs来开发三个不同的模型来解决这些障碍,应用了一种称为序列级知识蒸馏的方法。这些蒸馏模型能够保持由LLM生成的paraphrases的质量。它们还显示了更快的推理速度以及生成具有相似质量的多样paraphrases的能力。这些模型的一个显著的特点是,在展示语义多样性的同时,也保留了词汇多样性,这是由于现有数据质量问题在数据集中很少观察到的特征,并且通常不会在基于神经的方法中观察到。 通过对我们的模型进行人机评估,结果显示与LLM教师模型在蒸馏过程中使用的模型相比,性能只有降低了4%。尽管这个模型是LLM的1000倍 smaller,但这项研究为自然语言生成领域提供了重要的贡献,为paraphrasing任务提供了更高效、更经济的方法。
URL
https://arxiv.org/abs/2404.12596