Paper Reading AI Learner

A mean teacher algorithm for unlearning of language models

2025-04-18 00:34:19
Yegor Klochkov

Abstract

One of the goals of language model unlearning is to reduce memorization of selected text instances while retaining the model's general abilities. Despite various proposed methods, reducing memorization of large datasets without noticeable degradation in model utility remains challenging. In this paper, we investigate the mean teacher algorithm (Tarvainen & Valpola, 2017), a simple proximal optimization method from continual learning literature that gradually modifies the teacher model. We show that the mean teacher can approximate a trajectory of a slow natural gradient descent (NGD), which inherently seeks low-curvature updates that are less likely to degrade the model utility. While slow NGD can suffer from vanishing gradients, we introduce a new unlearning loss called "negative log-unlikelihood" (NLUL) that avoids this problem. We show that the combination of mean teacher and NLUL improves some metrics on the MUSE benchmarks (Shi et al., 2024).

Abstract (translated)

语言模型“遗忘”(unlearning)的一个目标是减少对选定文本实例的记忆,同时保持模型的一般能力。尽管已提出了多种方法,但在不显著降低模型实用性的情况下减少大型数据集的内存仍是一个挑战。在本文中,我们研究了均值教师算法(Tarvainen & Valpola, 2017),这是一种源自连续学习文献中的简单邻近优化方法,它逐步修改教师模型。我们展示了均值教师可以逼近慢速自然梯度下降(NGD)的轨迹,而这种更新本质上寻求低曲率更新,这些更新不太可能降低模型实用性。虽然慢速NGD可能会遇到消失的梯度问题,但我们引入了一种新的遗忘损失方法,称为“负对数非可能性”(NLUL),它可以避免这个问题。我们展示了均值教师与NLUL结合后,在MUSE基准测试(Shi et al., 2024)中的某些指标有所提高。

URL

https://arxiv.org/abs/2504.13388

PDF

https://arxiv.org/pdf/2504.13388.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot