Abstract
Label noise is inherent in many deep learning tasks when the training set becomes large. A typical approach to tackle noisy labels is using robust loss functions. Categorical cross entropy (CCE) is a successful loss function in many applications. However, CCE is also notorious for fitting samples with corrupted labels easily. In contrast, mean absolute error (MAE) is noise-tolerant theoretically, but it generally works much worse than CCE in practice. In this work, we have three main points. First, to explain why MAE generally performs much worse than CCE, we introduce a new understanding of them fundamentally by exposing their intrinsic sample weighting schemes from the perspective of every sample's gradient magnitude with respect to logit vector. Consequently, we find that MAE's differentiation degree over training examples is too small so that informative ones cannot contribute enough against the non-informative during training. Therefore, MAE generally underfits training data when noise rate is high. Second, based on our finding, we propose an improved MAE (IMAE), which inherits MAE's good noise-robustness. Moreover, the differentiation degree over training data points is controllable so that IMAE addresses the underfitting problem of MAE. Third, the effectiveness of IMAE against CCE and MAE is evaluated empirically with extensive experiments, which focus on image classification under synthetic corrupted labels and video retrieval under real noisy labels.
Abstract (translated)
当训练集变大时,标签噪声是许多深层学习任务固有的。处理噪声标签的典型方法是使用鲁棒损失函数。分类交叉熵(CCE)在许多应用中是一种成功的损失函数。然而,CCE也因容易贴上损坏标签的样品而臭名昭著。相比之下,平均绝对误差(MAE)在理论上具有抗噪声能力,但在实际应用中通常比CCE效果差得多。在这项工作中,我们有三个要点。首先,为了解释MAE通常比CCE表现差得多的原因,我们从每个样本相对于logit向量的梯度幅度的角度,从根本上揭示了它们的内在样本加权方案,从而引入了对它们的新理解。因此,我们发现MAE对培训实例的区分程度太小,因此在培训过程中,信息型的MAE不能对非信息型的MAE做出足够的贡献。因此,当噪声率较高时,MAE通常会在其训练数据下面。其次,基于我们的发现,我们提出了一种改进的MAE(IMAE),它继承了MAE良好的噪声鲁棒性。此外,训练数据点之间的差异程度是可控的,因此IMAE可以解决MAE的欠匹配问题。第三,通过大量的实验,对IMAE对CCE和MAE的有效性进行了实证评估,重点研究了合成污损标签下的图像分类和真实噪声标签下的视频检索。
URL
https://arxiv.org/abs/1903.12141