Paper Reading AI Learner

Catastrophic Forgetting Mitigation Through Plateau Phase Activity Profiling

2025-07-11 16:38:40
Idan Mashiach, Oren Glickman, Tom Tirer

Abstract

Catastrophic forgetting in deep neural networks occurs when learning new tasks degrades performance on previously learned tasks due to knowledge overwriting. Among the approaches to mitigate this issue, regularization techniques aim to identify and constrain "important" parameters to preserve previous knowledge. In the highly nonconvex optimization landscape of deep learning, we propose a novel perspective: tracking parameters during the final training plateau is more effective than monitoring them throughout the entire training process. We argue that parameters that exhibit higher activity (movement and variability) during this plateau reveal directions in the loss landscape that are relatively flat, making them suitable for adaptation to new tasks while preserving knowledge from previous ones. Our comprehensive experiments demonstrate that this approach achieves superior performance in balancing catastrophic forgetting mitigation with strong performance on newly learned tasks.

Abstract (translated)

深度神经网络中的灾难性遗忘现象是指在学习新任务时,会损害之前已经学过的任务的表现,原因是旧知识被新的知识覆盖。为缓解这个问题,正则化技术旨在识别并约束“重要”参数以保存先前的知识。在深度学习中高度非凸的优化景观中,我们提出了一种新颖的观点:追踪最终训练平台期期间的参数比在整个训练过程中监测它们更有效。我们认为,在这个平台期内表现出更高活动(移动和变化)的参数揭示了损失景观中的相对平坦区域,这使得它们适合适应新任务的同时保留之前的知识。我们的全面实验表明,这种方法在缓解灾难性遗忘与在新学习的任务上取得优异表现之间实现了更好的平衡。 具体来说: - 灾难性遗忘指的是深度神经网络在学习新任务时导致旧任务性能下降的现象。 - 为解决这一问题,一种常用的方法是采用正则化技术来识别和限制“重要”的参数,从而保护之前的知识不被覆盖。 - 在非凸的优化空间中,我们提出了一种新的视角:关注模型在训练接近尾声、达到性能平台期时参数的变化情况,而不是在整个训练过程中持续监控它们。我们认为,在这个阶段变化较大的参数表明网络能够在保留旧知识的同时适应新任务。 - 我们的实验结果证明了这种方法能更有效地平衡灾难性遗忘的缓解与新的学习任务上的表现提升之间的关系。 总的来说,这种侧重于识别和保护在最终训练平台期活跃的重要参数的方法显示出了优越的效果。

URL

https://arxiv.org/abs/2507.08736

PDF

https://arxiv.org/pdf/2507.08736.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot