Abstract
Machine Unlearning, the process of selectively eliminating the influence of certain data examples used during a model's training, has gained significant attention as a means for practitioners to comply with recent data protection regulations. However, existing unlearning methods face critical drawbacks, including their prohibitively high cost, often associated with a large number of hyperparameters, and the limitation of forgetting only relatively small data portions. This often makes retraining the model from scratch a quicker and more effective solution. In this study, we introduce Gradient-based and Task-Agnostic machine Unlearning ($\nabla \tau$), an optimization framework designed to remove the influence of a subset of training data efficiently. It applies adaptive gradient ascent to the data to be forgotten while using standard gradient descent for the remaining data. $\nabla \tau$ offers multiple benefits over existing approaches. It enables the unlearning of large sections of the training dataset (up to 30%). It is versatile, supporting various unlearning tasks (such as subset forgetting or class removal) and applicable across different domains (images, text, etc.). Importantly, $\nabla \tau$ requires no hyperparameter adjustments, making it a more appealing option than retraining the model from scratch. We evaluate our framework's effectiveness using a set of well-established Membership Inference Attack metrics, demonstrating up to 10% enhancements in performance compared to state-of-the-art methods without compromising the original model's accuracy.
Abstract (translated)
机器无监督学习,即在模型训练过程中选择性地消除某些数据示例的影响,已经引起了实践者遵守最近的数据保护法规的广泛关注。然而,现有的无监督学习方法面临着关键的缺点,包括其高得惊人的成本,通常与大量超参数相关,以及只能忘记相对较小的数据部分的限制。这通常使得从零开始重新训练模型成为更快、更有效的解决方案。在本文中,我们介绍了基于梯度的无监督学习和任务无关的无监督学习($\nabla \tau$),一种旨在有效地移除训练数据中指定数据示例影响的优化框架。它使用自适应梯度上升对要忘记的数据应用标准梯度下降。$\nabla \tau$ 提供了比现有方法多项优势。它能够启发式地忘记训练数据的大部分(最多30%)。它具有多样性,支持各种无监督任务(如子集忘记或类别删除),并适用于各种领域(图像,文本等)。重要的是,$\nabla \tau$ 不需要进行超参数调整,使其比从头开始重新训练模型更具吸引力。我们通过一系列经过检验的元学习攻击指标来评估我们的框架的有效性,证明了与最先进方法相比,性能提高了10%以上,同时保持原始模型的准确性。
URL
https://arxiv.org/abs/2403.14339