CKD: Contrastive Knowledge Distillation from A Sample-wise Perspective

Abstract
Abstract (translated)
URL
PDF

Abstract

In this paper, we present a simple yet effective contrastive knowledge distillation approach, which can be formulated as a sample-wise alignment problem with intra- and inter-sample constraints. Unlike traditional knowledge distillation methods that concentrate on maximizing feature similarities or preserving class-wise semantic correlations between teacher and student features, our method attempts to recover the "dark knowledge" by aligning sample-wise teacher and student logits. Specifically, our method first minimizes logit differences within the same sample by considering their numerical values, thus preserving intra-sample similarities. Next, we bridge semantic disparities by leveraging dissimilarities across different samples. Note that constraints on intra-sample similarities and inter-sample dissimilarities can be efficiently and effectively reformulated into a contrastive learning framework with newly designed positive and negative pairs. The positive pair consists of the teacher's and student's logits derived from an identical sample, while the negative pairs are formed by using logits from different samples. With this formulation, our method benefits from the simplicity and efficiency of contrastive learning through the optimization of InfoNCE, yielding a run-time complexity that is far less than $O(n^2)$, where $n$ represents the total number of training samples. Furthermore, our method can eliminate the need for hyperparameter tuning, particularly related to temperature parameters and large batch sizes. We conduct comprehensive experiments on three datasets including CIFAR-100, ImageNet-1K, and MS COCO. Experimental results clearly confirm the effectiveness of the proposed method on both image classification and object detection tasks. Our source codes will be publicly available at this https URL.

Abstract (translated)

在本文中，我们提出了一种简单而有效的对比性知识蒸馏方法，可以将其表述为样本层面的对齐问题，具有内部样本和跨样本约束。与传统的知识蒸馏方法不同，该方法试图通过将样本层面的教师和学生的对数值对齐来恢复“暗知识”，具体来说，我们的方法首先通过考虑它们的数值值来最小化同一样本内的对数值差异，从而保留内部样本相似性。接下来，我们通过利用不同样本之间的差异来桥通常的语义差异。需要注意的是，对内部样本相似性和跨样本差异的限制可以有效地转化为一个新的设计的有向二进制对齐学习框架。其中一对正对是由相同的样本生成的教师和学生的对数值，而负对则是由不同样本的推理得出的。通过这种表示方法，我们的方法通过优化InfoNCE实现了对比学习的高效性和效率，其运行时间复杂度远低于$O(n^2)$，其中$n$表示训练样本的总数。此外，我们的方法可以消除关于温度参数和大批量的超参数 tuning需求，特别与温度参数和大的批量大小的相关。我们对包括CIFAR-100、ImageNet-1K和MS COCO在内的三个数据集进行了全面的实验。实验结果明确证实了所提出方法在图像分类和目标检测任务上的有效性。我们的源代码将公开发布在这个https URL上。

URL

https://arxiv.org/abs/2404.14109

PDF

https://arxiv.org/pdf/2404.14109.pdf

CKD: Contrastive Knowledge Distillation from A Sample-wise Perspective

Abstract

Abstract (translated)

URL

PDF Copy

PDF