Abstract
Knowledge Distillation (KD) transfers knowledge from a large pre-trained teacher network to a compact and efficient student network, making it suitable for deployment on resource-limited media terminals. However, traditional KD methods require balanced data to ensure robust training, which is often unavailable in practical applications. In such scenarios, a few head categories occupy a substantial proportion of examples. This imbalance biases the trained teacher network towards the head categories, resulting in severe performance degradation on the less represented tail categories for both the teacher and student networks. In this paper, we propose a novel framework called Knowledge Rectification Distillation (KRDistill) to address the imbalanced knowledge inherited in the teacher network through the incorporation of the balanced category priors. Furthermore, we rectify the biased predictions produced by the teacher network, particularly focusing on the tail categories. Consequently, the teacher network can provide balanced and accurate knowledge to train a reliable student network. Intensive experiments conducted on various long-tailed datasets demonstrate that our KRDistill can effectively train reliable student networks in realistic scenarios of data imbalance.
Abstract (translated)
知识蒸馏(KD)将一个大型的预训练教师网络中的知识传递给一个紧凑且高效的 student网络,使其适用于资源受限的媒体终端。然而,传统的 KD 方法需要平衡数据来确保稳健的训练,这在实际应用中通常是不可用的。在这种情况下,少数头类别占据了很大的比例。这种不平衡使得训练后的教师网络倾向于头类别,导致教师和学生网络在低代表类别的表现严重下降。在本文中,我们提出了一种新框架,称为知识纠正蒸馏(KRDistill),通过引入平衡类别 prior 来解决教师网络中传递的不平衡知识。此外,我们还纠正了教师网络产生的有偏见预测,特别是关注尾类别。因此,教师网络可以提供平衡和准确的知识来训练一个可靠的 student network。在各种长尾数据集上进行的大量实验证明,我们的 KRDistill 在数据不平衡的现实场景中可以有效训练可靠的 student network。
URL
https://arxiv.org/abs/2409.07694