Abstract
Knowledge distillation deals with the problem of training a smaller model (Student) from a high capacity source model (Teacher) so as to retain most of its performance. Existing approaches use either the training data or meta-data extracted from it in order to train the Student. However, accessing the dataset on which the Teacher has been trained may not always be feasible if the dataset is very large or it poses privacy or safety concerns (e.g., bio-metric or medical data). Hence, in this paper, we propose a novel data-free method to train the Student from the Teacher. Without even using any meta-data, we synthesize the Data Impressions from the complex Teacher model and utilize these as surrogates for the original training data samples to transfer its learning to Student via knowledge distillation. We, therefore, dub our method "Zero-Shot Knowledge Distillation" and demonstrate that our framework results in competitive generalization performance as achieved by distillation using the actual training data samples on multiple benchmark datasets.
Abstract (translated)
知识蒸馏解决了从大容量源模型(教师)中训练小模型(学生)以保持其大部分性能的问题。现有的方法使用训练数据或从中提取的元数据来训练学生。但是,如果数据集非常大或存在隐私或安全问题(如生物测量或医疗数据),则访问接受过教师培训的数据集可能并不总是可行的。因此,本文提出了一种新的无数据方法来训练教师对学生的影响。在不使用任何元数据的情况下,我们从复杂的教师模型中合成数据印象,并利用这些数据作为原始训练数据样本的替代物,通过知识蒸馏将其学习转移给学生。因此,我们将我们的方法称为“零镜头知识蒸馏”,并证明我们的框架通过使用多个基准数据集上的实际训练数据样本蒸馏来获得具有竞争力的泛化性能。
URL
https://arxiv.org/abs/1905.08114