Paper Reading AI Learner

SAFE: Slow and Fast Parameter-Efficient Tuning for Continual Learning with Pre-Trained Models

2024-11-04 15:34:30
Linglan Zhao, Xuerui Zhang, Ke Yan, Shouhong Ding, Weiran Huang

Abstract

Continual learning aims to incrementally acquire new concepts in data streams while resisting forgetting previous knowledge. With the rise of powerful pre-trained models (PTMs), there is a growing interest in training incremental learning systems using these foundation models, rather than learning from scratch. Existing works often view PTMs as a strong initial point and directly apply parameter-efficient tuning (PET) in the first session for adapting to downstream tasks. In the following sessions, most methods freeze model parameters for tackling forgetting issues. However, applying PET directly to downstream data cannot fully explore the inherent knowledge in PTMs. Additionally, freezing the parameters in incremental sessions hinders models' plasticity to novel concepts not covered in the first session. To solve the above issues, we propose a Slow And Fast parameter-Efficient tuning (SAFE) framework. In particular, to inherit general knowledge from foundation models, we include a transfer loss function by measuring the correlation between the PTM and the PET-applied model. After calibrating in the first session, the slow efficient tuning parameters can capture more informative features, improving generalization to incoming classes. Moreover, to further incorporate novel concepts, we strike a balance between stability and plasticity by fixing slow efficient tuning parameters and continuously updating the fast ones. Specifically, a cross-classification loss with feature alignment is proposed to circumvent catastrophic forgetting. During inference, we introduce an entropy-based aggregation strategy to dynamically utilize the complementarity in the slow and fast learners. Extensive experiments on seven benchmark datasets verify the effectiveness of our method by significantly surpassing the state-of-the-art.

Abstract (translated)

持续学习的目标是在数据流中逐步获取新概念,同时抵抗遗忘先前的知识。随着强大的预训练模型(PTMs)的兴起,人们越来越感兴趣使用这些基础模型来训练增量学习系统,而不是从头开始学习。现有工作通常将PTMs视为一个强初始点,并在第一次会话中直接应用参数高效微调(PET),以适应下游任务。在随后的会话中,大多数方法冻结模型参数以应对遗忘问题。然而,直接对下游数据应用PET不能完全挖掘出PTMs中的内在知识。此外,在增量会话中冻结参数会阻碍模型对于第一会话未涵盖的新概念的可塑性。为了解决上述问题,我们提出了一种慢速与快速参数高效微调(SAFE)框架。特别是为了从基础模型中继承通用知识,我们通过测量PTM和应用PET后的模型之间的相关性来包含一个转移损失函数。在第一次会话校准后,慢速高效的微调参数可以捕捉到更多有意义的特征,从而改善对新类别的一般化能力。此外,为了进一步整合新的概念,我们在稳定性与可塑性之间找到平衡点,通过固定缓慢高效调整的参数并持续更新快速参数来实现。具体而言,提出了一种具有特征对齐的跨分类损失以避免灾难性的遗忘问题。在推理过程中,我们引入了一个基于熵的聚合策略,动态利用慢速学习者和快速学习者的互补性。大量的实验结果表明,在七个基准数据集上的测试中,我们的方法显著超越了现有最先进方法的有效性。

URL

https://arxiv.org/abs/2411.02175

PDF

https://arxiv.org/pdf/2411.02175.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot