Abstract
Catastrophic forgetting in deep neural networks occurs when learning new tasks degrades performance on previously learned tasks due to knowledge overwriting. Among the approaches to mitigate this issue, regularization techniques aim to identify and constrain "important" parameters to preserve previous knowledge. In the highly nonconvex optimization landscape of deep learning, we propose a novel perspective: tracking parameters during the final training plateau is more effective than monitoring them throughout the entire training process. We argue that parameters that exhibit higher activity (movement and variability) during this plateau reveal directions in the loss landscape that are relatively flat, making them suitable for adaptation to new tasks while preserving knowledge from previous ones. Our comprehensive experiments demonstrate that this approach achieves superior performance in balancing catastrophic forgetting mitigation with strong performance on newly learned tasks.
Abstract (translated)
深度神经网络中的灾难性遗忘现象是指在学习新任务时,会损害之前已经学过的任务的表现,原因是旧知识被新的知识覆盖。为缓解这个问题,正则化技术旨在识别并约束“重要”参数以保存先前的知识。在深度学习中高度非凸的优化景观中,我们提出了一种新颖的观点:追踪最终训练平台期期间的参数比在整个训练过程中监测它们更有效。我们认为,在这个平台期内表现出更高活动(移动和变化)的参数揭示了损失景观中的相对平坦区域,这使得它们适合适应新任务的同时保留之前的知识。我们的全面实验表明,这种方法在缓解灾难性遗忘与在新学习的任务上取得优异表现之间实现了更好的平衡。 具体来说: - 灾难性遗忘指的是深度神经网络在学习新任务时导致旧任务性能下降的现象。 - 为解决这一问题,一种常用的方法是采用正则化技术来识别和限制“重要”的参数,从而保护之前的知识不被覆盖。 - 在非凸的优化空间中,我们提出了一种新的视角:关注模型在训练接近尾声、达到性能平台期时参数的变化情况,而不是在整个训练过程中持续监控它们。我们认为,在这个阶段变化较大的参数表明网络能够在保留旧知识的同时适应新任务。 - 我们的实验结果证明了这种方法能更有效地平衡灾难性遗忘的缓解与新的学习任务上的表现提升之间的关系。 总的来说,这种侧重于识别和保护在最终训练平台期活跃的重要参数的方法显示出了优越的效果。
URL
https://arxiv.org/abs/2507.08736