Paper Reading AI Learner

Mitigating Catastrophic Forgetting in Long Short-Term Memory Networks

2023-05-26 20:17:18
Ketaki Joshi, Raghavendra Pradyumna Pothukuchi, Andre Wibisono, Abhishek Bhattacharjee

Abstract

Continual learning on sequential data is critical for many machine learning (ML) deployments. Unfortunately, LSTM networks, which are commonly used to learn on sequential data, suffer from catastrophic forgetting and are limited in their ability to learn multiple tasks continually. We discover that catastrophic forgetting in LSTM networks can be overcome in two novel and readily-implementable ways -- separating the LSTM memory either for each task or for each target label. Our approach eschews the need for explicit regularization, hypernetworks, and other complex methods. We quantify the benefits of our approach on recently-proposed LSTM networks for computer memory access prefetching, an important sequential learning problem in ML-based computer system optimization. Compared to state-of-the-art weight regularization methods to mitigate catastrophic forgetting, our approach is simple, effective, and enables faster learning. We also show that our proposal enables the use of small, non-regularized LSTM networks for complex natural language processing in the offline learning scenario, which was previously considered difficult.

Abstract (translated)

对Sequential数据进行持续学习对于许多机器学习(ML)部署是至关重要的。不幸的是,LSTM网络,通常用于对Sequential数据进行学习,存在灾难性遗忘,并且其持续学习能力受到限制。我们发现,LSTM网络的灾难性遗忘可以通过两个新颖且易于实现的方法来解决——分别对每个任务或每个目标标签的LSTM记忆进行分离。我们的方法和避免使用显式正则化、超网络和其他复杂的方法。我们量化了我们对最近提出的LSTM网络用于计算机内存访问预加载的研究所带来的好处,这是一个在基于机器学习的计算机系统优化中非常重要的Sequential学习问题。与旨在缓解灾难性遗忘的先进的权重正则化方法相比,我们的方法和简单、有效,并且能够加速学习。我们还展示了,我们的建议使可以使用小型未正则化的LSTM网络在离线学习场景中进行复杂的自然语言处理,这在以前被认为是困难的。

URL

https://arxiv.org/abs/2305.17244

PDF

https://arxiv.org/pdf/2305.17244.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot