Abstract
Corruptions due to data perturbations and label noise are prevalent in the datasets from unreliable sources, which poses significant threats to model training. Despite existing efforts in developing robust models, current learning methods commonly overlook the possible co-existence of both corruptions, limiting the effectiveness and practicability of the model. In this paper, we develop an Effective and Robust Adversarial Training (ERAT) framework to simultaneously handle two types of corruption (i.e., data and label) without prior knowledge of their specifics. We propose a hybrid adversarial training surrounding multiple potential adversarial perturbations, alongside a semi-supervised learning based on class-rebalancing sample selection to enhance the resilience of the model for dual corruption. On the one hand, in the proposed adversarial training, the perturbation generation module learns multiple surrogate malicious data perturbations by taking a DNN model as the victim, while the model is trained to maintain semantic consistency between the original data and the hybrid perturbed data. It is expected to enable the model to cope with unpredictable perturbations in real-world data corruption. On the other hand, a class-rebalancing data selection strategy is designed to fairly differentiate clean labels from noisy labels. Semi-supervised learning is performed accordingly by discarding noisy labels. Extensive experiments demonstrate the superiority of the proposed ERAT framework.
Abstract (translated)
由于数据扰动和标签噪声导致的腐败在不可靠数据源的数据集中普遍存在,这会对模型训练产生重大威胁。尽管已经开发出了一些 robust 的模型,但目前的训练方法通常忽视了两种腐败(即数据和标签)可能同时存在的可能性,从而限制了模型的有效性和可操作性。在本文中,我们提出了一种有效的鲁棒对抗训练(ERAT)框架,以同时处理两种腐败(即数据和标签),而无需具体了解其情况。我们提出了一种基于多个潜在对抗扰动周围进行半监督学习的方法,以及一种基于类重新平衡样本选择来增强模型对双重腐败的鲁棒性的方法。一方面,在所提出的 ERAT 训练中,扰动生成模块通过将 DNN 模型作为受害者来学习多个代理恶意数据扰动,而模型通过保持原始数据和混合扰动数据的语义一致来训练。预计这将使模型能够应对现实世界数据腐败中的不可预测扰动。另一方面,为了公平地区分清洁标签和噪声标签,我们设计了一种类重新平衡数据选择策略。相应地进行半监督学习,通过丢弃噪声标签。大量实验证明,所提出的 ERAT 框架具有优越性。
URL
https://arxiv.org/abs/2405.04191