Abstract
Despite the progress of Semi-supervised Learning (SSL), existing methods fail to utilize unlabeled data effectively and efficiently. Many pseudo-label-based methods select unlabeled examples based on inaccurate confidence scores from the classifier. Most prior work also uses all available unlabeled data without pruning, making it difficult to handle large amounts of unlabeled data. To address these issues, we propose two methods: Variational Confidence Calibration (VCC) and Influence-Function-based Unlabeled Sample Elimination (INFUSE). VCC is an universal plugin for SSL confidence calibration, using a variational autoencoder to select more accurate pseudo labels based on three types of consistency scores. INFUSE is a data pruning method that constructs a core dataset of unlabeled examples under SSL. Our methods are effective in multiple datasets and settings, reducing classification errors rates and saving training time. Together, VCC-INFUSE reduces the error rate of FlexMatch on the CIFAR-100 dataset by 1.08% while saving nearly half of the training time.
Abstract (translated)
尽管半监督学习(SSL)取得了进展,但现有的方法并未充分利用无标签数据。许多基于伪标签的方法根据分类器不准确的置信度分数选择无标签示例。大多数先前的研究也使用了所有可用的无标签数据,没有进行剪枝,这使得处理大量无标签数据变得困难。为了应对这些问题,我们提出了两种方法:变分置信校准(VCC)和基于影响函数的未标记样本消除(INFUSE)。VCC是SSL置信度校准的通用插件,使用变分自编码器根据三种一致性分数选择更准确的伪标签。INFUSE是一种数据剪枝方法,在SSL下构建无标签示例的核心数据集。我们的方法在多个数据集和设置中有效,降低分类错误率并节省训练时间。与VCC-INFUSE一起,INFUSE在CIFAR-100数据集上的FlexMatch错误率降低了1.08%,同时训练时间缩短了几乎一半。
URL
https://arxiv.org/abs/2404.11947