Abstract
The primary goal of training in early convolutional neural networks (CNN) is the higher generalization performance of the model. However, as the expected calibration error (ECE), which quantifies the explanatory power of model inference, was recently introduced, research on training models that can be explained is in progress. We hypothesized that a gap in supervision criteria during training and inference leads to overconfidence, and investigated that performing label distribution learning (LDL) would enhance the model calibration in CNN training. To verify this assumption, we used a simple LDL setting with recent data augmentation techniques. Based on a series of experiments, the following results are obtained: 1) State-of-the-art KD methods significantly impede model calibration. 2) Training using LDL with recent data augmentation can have excellent effects on model calibration and even in generalization performance. 3) Online LDL brings additional improvements in model calibration and accuracy with long training, especially in large-size models. Using the proposed approach, we simultaneously achieved a lower ECE and higher generalization performance for the image classification datasets CIFAR10, 100, STL10, and ImageNet. We performed several visualizations and analyses and witnessed several interesting behaviors in CNN training with the LDL.
Abstract (translated)
早期卷积神经网络(CNN)的训练的主要目标是提高模型的泛化性能。然而,最近引入了 expected calibration error (ECE),该指标衡量模型推理解释力,因此正在研究能够解释训练模型的方法。我们假设训练和推理过程中的监督标准之间存在差异会导致过度自信,并研究使用标签分布学习(LDL)可以提高CNN训练模型的校准。为了验证这个假设,我们使用了最近的数据增强技术简单的LDL设置。通过一系列实验,以下结果得出:1) 先进的 KD 方法严重阻碍模型校准。2) 使用最近的数据增强技术使用 LDL 训练可以显著提高模型校准和泛化性能。3) 在线 LDL 训练在长时间训练期间可以增加模型校准和精度,特别是大型模型。使用我们提出的方法,我们同时实现了 CIFAR10、100、STL10 和 ImageNet 图像分类数据集更低的 ECE 和提高更高的泛化性能。我们进行了一些可视化和分析,并见证了在 CNN 训练中使用 LDL 的几个有趣的行为。
URL
https://arxiv.org/abs/2301.13444