Early Transformers: A study on Efficient Training of Transformer Models through Early-Bird Lottery Tickets

Abstract
Abstract (translated)
URL
PDF

Abstract

The training of Transformer models has revolutionized natural language processing and computer vision, but it remains a resource-intensive and time-consuming process. This paper investigates the applicability of the early-bird ticket hypothesis to optimize the training efficiency of Transformer models. We propose a methodology that combines iterative pruning, masked distance calculation, and selective retraining to identify early-bird tickets in various Transformer architectures, including ViT, Swin-T, GPT-2, and RoBERTa. Our experimental results demonstrate that early-bird tickets can be consistently found within the first few epochs of training or fine-tuning, enabling significant resource optimization without compromising performance. The pruned models obtained from early-bird tickets achieve comparable or even superior accuracy to their unpruned counterparts while substantially reducing memory usage. Furthermore, our comparative analysis highlights the generalizability of the early-bird ticket phenomenon across different Transformer models and tasks. This research contributes to the development of efficient training strategies for Transformer models, making them more accessible and resource-friendly. By leveraging early-bird tickets, practitioners can accelerate the progress of natural language processing and computer vision applications while reducing the computational burden associated with training Transformer models.

Abstract (translated)

Transformer模型的训练已经颠覆了自然语言处理和计算机视觉，但仍是一个资源密集和耗时的过程。本文研究了早期鸟票假设对优化Transformer模型的训练效率的适用性。我们提出了一种结合迭代修剪、遮罩距离计算和选择性重置的方法来识别各种Transformer架构中的早期鸟票。我们的实验结果表明，在训练或微调的早期几轮中，早期鸟票可以在各种Transformer架构中持续发现，从而实现显著的资源优化，同时不牺牲性能。通过早期鸟票获得的修剪模型在保持准确性的同时，大大减少了内存使用。此外，我们的比较分析强调了早期鸟票现象在不同Transformer模型和任务上的普遍性。这项研究为Transformer模型的有效训练策略的发展做出了贡献，使这些模型更加易于使用和资源友好。通过利用早期鸟票，实践者可以加速自然语言处理和计算机视觉应用的发展，同时降低训练Transformer模型的计算负担。

URL

https://arxiv.org/abs/2405.02353

PDF

https://arxiv.org/pdf/2405.02353.pdf

Early Transformers: A study on Efficient Training of Transformer Models through Early-Bird Lottery Tickets

Abstract

Abstract (translated)

URL

PDF Copy

PDF