Abstract
Popular zero-shot models suffer due to artifacts inherited from pretraining. A particularly detrimental artifact, caused by unbalanced web-scale pretraining data, is mismatched label distribution. Existing approaches that seek to repair the label distribution are not suitable in zero-shot settings, as they have incompatible requirements such as access to labeled downstream task data or knowledge of the true label balance in the pretraining distribution. We sidestep these challenges and introduce a simple and lightweight approach to adjust pretrained model predictions via optimal transport. Our technique requires only an estimate of the label distribution of a downstream task. Theoretically, we characterize the improvement produced by our procedure under certain mild conditions and provide bounds on the error caused by misspecification. Empirically, we validate our method in a wide array of zero-shot image and text classification tasks, improving accuracy by 4.8% and 15.9% on average, and beating baselines like Prior Matching -- often by significant margins -- in 17 out of 21 datasets.
Abstract (translated)
由于预训练中存在的元数据导致的 artifacts,流行的一零 shot 模型效果不佳。特别有害的元数据是由不平衡的跨网站预训练数据引起的,即不平衡标签分布。现有的试图修复标签分布的方法在零 shot 设置中并不适用,因为它们具有不兼容的要求,如访问已标注的下游任务数据或对预训练分布的真实标签平衡的了解。我们避开了这些挑战,并引入了一种简单而轻量级的通过最优传输调整预训练模型预测的方法。我们的技术只需要预训练任务下游任务的标签分布的估计。从理论上看,我们研究了我们的过程在某些轻度条件下的改进,并提供了由不准确估计引起的误差的上界。在实证研究中,我们在广泛的零 shot图像和文本分类任务中验证了我们的方法,平均提高了 4.8% 的准确率,并且在 21 个数据集中的基线(如 Prior Matching)中击败了像这样具有显著优势的基线。
URL
https://arxiv.org/abs/2404.08461