Paper Reading AI Learner

READ: Recurrent Adaptation of Large Transformers

2023-05-24 16:59:41
Sid Wang, John Nguyen, Ke Li, Carole-Jean Wu

Abstract

Fine-tuning large-scale Transformers has led to the explosion of many AI applications across Natural Language Processing and Computer Vision tasks. However, fine-tuning all pre-trained model parameters becomes impractical as the model size and number of tasks increase. Parameter-efficient transfer learning (PETL) methods aim to address these challenges. While effective in reducing the number of trainable parameters, PETL methods still require significant energy and computational resources to fine-tune. In this paper, we introduce \textbf{RE}current \textbf{AD}aption (READ) -- a lightweight and memory-efficient fine-tuning method -- to overcome the limitations of the current PETL approaches. Specifically, READ inserts a small RNN network alongside the backbone model so that the model does not have to back-propagate through the large backbone network. Through comprehensive empirical evaluation of the GLUE benchmark, we demonstrate READ can achieve a $56\%$ reduction in the training memory consumption and an $84\%$ reduction in the GPU energy usage while retraining high model quality compared to full-tuning. Additionally, the model size of READ does not grow with the backbone model size, making it a highly scalable solution for fine-tuning large Transformers.

Abstract (translated)

微调大规模Transformer模型导致了许多在自然语言处理和计算机视觉任务中的AI应用的爆发。然而,随着模型大小和任务数量的增加,微调所有已训练模型参数变得非常不切实际。参数高效的迁移学习方法(PETL)旨在解决这些问题。虽然PETL方法可以有效地减少可训练参数的数量,但仍需要大量的能源和计算资源进行微调。在本文中,我们介绍了循环吸收(READ) - 一种轻量级且内存高效的微调方法 - 以克服当前PETL方法的局限性。具体来说,READ将小循环神经网络与基线模型并行,从而使模型不需要通过大型基线网络进行反向传播。通过全面 empirical 评估GLUE基准测试,我们证明READ可以在与全调相比,提高模型质量的同时,减少训练内存消耗和GPU能源使用 $56\%$。此外,READ的模型大小与基线模型大小无关,因此它是微调大型Transformer的高效可扩展解决方案。

URL

https://arxiv.org/abs/2305.15348

PDF

https://arxiv.org/pdf/2305.15348.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot