Paper Reading AI Learner

Learn, Unlearn and Relearn: An Online Learning Paradigm for Deep Neural Networks

2023-03-18 16:45:54
Vijaya Raghavan T. Ramkumar, Elahe Arani, Bahram Zonooz

Abstract

Deep neural networks (DNNs) are often trained on the premise that the complete training data set is provided ahead of time. However, in real-world scenarios, data often arrive in chunks over time. This leads to important considerations about the optimal strategy for training DNNs, such as whether to fine-tune them with each chunk of incoming data (warm-start) or to retrain them from scratch with the entire corpus of data whenever a new chunk is available. While employing the latter for training can be resource-intensive, recent work has pointed out the lack of generalization in warm-start models. Therefore, to strike a balance between efficiency and generalization, we introduce Learn, Unlearn, and Relearn (LURE) an online learning paradigm for DNNs. LURE interchanges between the unlearning phase, which selectively forgets the undesirable information in the model through weight reinitialization in a data-dependent manner, and the relearning phase, which emphasizes learning on generalizable features. We show that our training paradigm provides consistent performance gains across datasets in both classification and few-shot settings. We further show that it leads to more robust and well-calibrated models.

Abstract (translated)

深度神经网络(DNN)通常的训练前提是提供完整的训练数据集。然而,在现实场景中,数据通常会随着时间的推移以块的形式出现。这导致对于训练DNN的最佳策略的重要考虑,例如是否需要对每个 incoming 数据块进行微调(温启动)或者是否需要在任何时候使用整个数据集块重新训练DNN。虽然使用后者进行训练可以耗费更多的资源,但最近的研究表明温启动模型缺乏泛化能力。因此,为了在效率和泛化之间取得平衡,我们提出了学习、遗忘和再学习(Lure)的DNN在线学习范式。Lure在遗忘阶段和再学习阶段之间交替进行,通过数据依赖性权重初始化选择性地遗忘模型中的不希望记得的信息。我们证明,我们的训练范式在分类和少量样本设置中提供一致性的性能增益。我们还证明,它导致更加稳健和校准的模型。

URL

https://arxiv.org/abs/2303.10455

PDF

https://arxiv.org/pdf/2303.10455.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot