Abstract
In this paper, we delve deeper into the Kullback-Leibler (KL) Divergence loss and observe that it is equivalent to the Doupled Kullback-Leibler (DKL) Divergence loss that consists of 1) a weighted Mean Square Error (wMSE) loss and 2) a Cross-Entropy loss incorporating soft labels. From our analysis of the DKL loss, we have identified two areas for improvement. Firstly, we address the limitation of DKL in scenarios like knowledge distillation by breaking its asymmetry property in training optimization. This modification ensures that the wMSE component is always effective during training, providing extra constructive cues. Secondly, we introduce global information into DKL for intra-class consistency regularization. With these two enhancements, we derive the Improved Kullback-Leibler (IKL) Divergence loss and evaluate its effectiveness by conducting experiments on CIFAR-10/100 and ImageNet datasets, focusing on adversarial training and knowledge distillation tasks. The proposed approach achieves new state-of-the-art performance on both tasks, demonstrating the substantial practical merits. Code and models will be available soon at this https URL.
Abstract (translated)
在本文中,我们深入探讨了Kullback-Leibler(KL)差异损失,并观察发现它等价于包含1个加权平方误差(wMSE)损失和2个交叉熵损失的DKL差异损失。从我们对DKL损失的分析中,我们确定了两个改进方向。首先,我们针对知识蒸馏等场景的DKL限制进行了改进,通过在训练优化中打破其不对称性质,确保wMSE部分在训练中总是有效,提供了额外的建设性提示。其次,我们引入了全局信息到DKL中,用于Intraclass一致性 Regularization。通过这两个改进,我们推导出了改进的Kullback-Leibler(IKL)差异损失,并通过实验在CIFAR-10/100和ImageNet数据集上重点关注对抗训练和知识蒸馏任务,评估了其有效性。提出的这种方法在两个任务上都实现了新的先进技术表现,展示了实质性的实用优势。代码和模型将在不久的将来在这个https URL上可用。
URL
https://arxiv.org/abs/2305.13948