Abstract
Continual learning aims to incrementally acquire new concepts in data streams while resisting forgetting previous knowledge. With the rise of powerful pre-trained models (PTMs), there is a growing interest in training incremental learning systems using these foundation models, rather than learning from scratch. Existing works often view PTMs as a strong initial point and directly apply parameter-efficient tuning (PET) in the first session for adapting to downstream tasks. In the following sessions, most methods freeze model parameters for tackling forgetting issues. However, applying PET directly to downstream data cannot fully explore the inherent knowledge in PTMs. Additionally, freezing the parameters in incremental sessions hinders models' plasticity to novel concepts not covered in the first session. To solve the above issues, we propose a Slow And Fast parameter-Efficient tuning (SAFE) framework. In particular, to inherit general knowledge from foundation models, we include a transfer loss function by measuring the correlation between the PTM and the PET-applied model. After calibrating in the first session, the slow efficient tuning parameters can capture more informative features, improving generalization to incoming classes. Moreover, to further incorporate novel concepts, we strike a balance between stability and plasticity by fixing slow efficient tuning parameters and continuously updating the fast ones. Specifically, a cross-classification loss with feature alignment is proposed to circumvent catastrophic forgetting. During inference, we introduce an entropy-based aggregation strategy to dynamically utilize the complementarity in the slow and fast learners. Extensive experiments on seven benchmark datasets verify the effectiveness of our method by significantly surpassing the state-of-the-art.
Abstract (translated)
持续学习的目标是在数据流中逐步获取新概念,同时抵抗遗忘先前的知识。随着强大的预训练模型(PTMs)的兴起,人们越来越感兴趣使用这些基础模型来训练增量学习系统,而不是从头开始学习。现有工作通常将PTMs视为一个强初始点,并在第一次会话中直接应用参数高效微调(PET),以适应下游任务。在随后的会话中,大多数方法冻结模型参数以应对遗忘问题。然而,直接对下游数据应用PET不能完全挖掘出PTMs中的内在知识。此外,在增量会话中冻结参数会阻碍模型对于第一会话未涵盖的新概念的可塑性。为了解决上述问题,我们提出了一种慢速与快速参数高效微调(SAFE)框架。特别是为了从基础模型中继承通用知识,我们通过测量PTM和应用PET后的模型之间的相关性来包含一个转移损失函数。在第一次会话校准后,慢速高效的微调参数可以捕捉到更多有意义的特征,从而改善对新类别的一般化能力。此外,为了进一步整合新的概念,我们在稳定性与可塑性之间找到平衡点,通过固定缓慢高效调整的参数并持续更新快速参数来实现。具体而言,提出了一种具有特征对齐的跨分类损失以避免灾难性的遗忘问题。在推理过程中,我们引入了一个基于熵的聚合策略,动态利用慢速学习者和快速学习者的互补性。大量的实验结果表明,在七个基准数据集上的测试中,我们的方法显著超越了现有最先进方法的有效性。
URL
https://arxiv.org/abs/2411.02175