Paper Reading AI Learner

AutoPEFT: Automatic Configuration Search for Parameter-Efficient Fine-Tuning

2023-01-28 08:51:23
Han Zhou, Xingchen Wan, Ivan Vulić, Anna Korhonen

Abstract

Large pretrained language models have been widely used in downstream NLP tasks via task-specific fine-tuning. Recently, an array of Parameter-Efficient Fine-Tuning (PEFT) methods have also achieved strong task performance while updating a much smaller number of parameters compared to full model tuning. However, it is non-trivial to make informed per-task design choices (i.e., to create PEFT configurations) concerning the selection of PEFT architectures and modules, the number of tunable parameters, and even the layers in which the PEFT modules are inserted. Consequently, it is highly likely that the current, manually set PEFT configurations might be suboptimal for many tasks from the perspective of the performance-to-efficiency trade-off. To address the core question of the PEFT configuration selection that aims to control and maximise the balance between performance and parameter efficiency, we first define a rich configuration search space spanning multiple representative PEFT modules along with finer-grained configuration decisions over the modules (e.g., parameter budget, insertion layer). We then propose AutoPEFT, a novel framework to traverse this configuration space: it automatically configures multiple PEFT modules via high-dimensional Bayesian optimisation. We show the resource scalability and task transferability of AutoPEFT-found configurations, outperforming existing PEFT methods on average on the standard GLUE benchmark while conducting the configuration search on a single task. The per-task AutoPEFT-based configuration search even outperforms full-model fine-tuning.

Abstract (translated)

大型预训练语言模型已经广泛用于下游自然语言处理任务,通过任务特定的微调使用。最近,一系列参数高效的微调方法(PEFT方法)也实现了强大的任务表现,同时更新的数量较少的参数相比整个模型微调要少得多。然而,做出每个任务 informed 的设计选择(即创建PEFT配置)非常重要,包括PEFT架构和模块的选择、可调整参数的数量,甚至是在模块中插入PEFT模块的层。因此,很可能当前手动设置PEFT配置可能对许多任务来说最优解,从性能与效率的权衡角度来看。为了解决PEFT配置选择的核心问题,旨在控制并最大化性能与参数效率之间的平衡,我们首先定义了一个丰富的配置搜索空间,涵盖了多个代表性的PEFT模块,以及更精细的配置决策,例如参数预算、插入层。然后,我们提出了AutoPEFT,这是一个新的框架,穿越这个配置空间:它通过高维贝叶斯优化自动配置多个PEFT模块。我们展示了AutoPEFT找到的配置资源 scalability和任务转移性,平均在标准GLUE基准测试中比现有的PEFT方法表现更好,同时在一个任务上进行配置搜索。每个任务上的AutoPEFT配置搜索甚至超过了整个模型的微调。

URL

https://arxiv.org/abs/2301.12132

PDF

https://arxiv.org/pdf/2301.12132.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot