Paper Reading AI Learner

A Stability Analysis of Fine-Tuning a Pre-Trained Model

2023-01-24 05:11:17
Zihao Fu, Anthony Man-Cho So, Nigel Collier

Abstract

Fine-tuning a pre-trained model (such as BERT, ALBERT, RoBERTa, T5, GPT, etc.) has proven to be one of the most promising paradigms in recent NLP research. However, numerous recent works indicate that fine-tuning suffers from the instability problem, i.e., tuning the same model under the same setting results in significantly different performance. Many recent works have proposed different methods to solve this problem, but there is no theoretical understanding of why and how these methods work. In this paper, we propose a novel theoretical stability analysis of fine-tuning that focuses on two commonly used settings, namely, full fine-tuning and head tuning. We define the stability under each setting and prove the corresponding stability bounds. The theoretical bounds explain why and how several existing methods can stabilize the fine-tuning procedure. In addition to being able to explain most of the observed empirical discoveries, our proposed theoretical analysis framework can also help in the design of effective and provable methods. Based on our theory, we propose three novel strategies to stabilize the fine-tuning procedure, namely, Maximal Margin Regularizer (MMR), Multi-Head Loss (MHLoss), and Self Unsupervised Re-Training (SURT). We extensively evaluate our proposed approaches on 11 widely used real-world benchmark datasets, as well as hundreds of synthetic classification datasets. The experiment results show that our proposed methods significantly stabilize the fine-tuning procedure and also corroborate our theoretical analysis.

Abstract (translated)

Fine-tuning预训练模型(如BERT、ALBERT、RoBERTa、T5、GPT等)已经成为近年来自然语言处理研究中最有前途的范式之一。然而,许多最近的研究表明, fine-tuning 遇到了稳定性问题,即在同一设置下调整相同的模型会得到显著不同的性能。许多最近的工作提出了不同的方法来解决这个问题,但缺乏对这些方法的理论理解。在本文中,我们提出了一种新的理论稳定性分析,重点关注常用的两种设置:全 Fine-tuning 和头 Fine-tuning。我们定义了每个设置下的稳定性,并证明了相应的稳定性边界。理论边界解释了为什么和 how 一些现有方法可以稳定地调整 Fine-tuning 过程。除了解释观察到的大部分经验发现外,我们提出的理论分析框架还可以帮助设计有效且可证明的方法。基于我们的理论,我们提出了三种新的策略来稳定地调整 Fine-tuning 过程,分别是最大margin Regularizer(MMR)、多头损失(MH Loss)和自监督重新训练(SURT)。我们广泛评估了我们提出的方法,在11个广泛使用的真实世界基准数据集和数百个合成分类数据集上进行了评估。实验结果表明,我们提出的方法显著稳定地调整 Fine-tuning 过程,并证实了我们的理论分析。

URL

https://arxiv.org/abs/2301.09820

PDF

https://arxiv.org/pdf/2301.09820.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot