Abstract
Many recent advances in computer vision are the result of a healthy competition among researchers on high quality, task-specific, benchmarks. After a decade of active research, zero-shot learning (ZSL) models accuracy on the Imagenet benchmark remains far too low to be considered for practical object recognition applications. In this paper, we argue that the main reason behind this apparent lack of progress is the poor quality of this benchmark. We highlight major structural flaws of the current benchmark and analyze different factors impacting the accuracy of ZSL models. We show that the actual classification accuracy of existing ZSL models is significantly higher than was previously thought as we account for these flaws. We then introduce the notion of structural bias specific to ZSL datasets. We discuss how the presence of this new form of bias allows for a trivial solution to the standard benchmark and conclude on the need for a new benchmark. We then detail the semi-automated construction of a new benchmark to address these flaws.
Abstract (translated)
计算机视觉的许多最新进展是研究人员在高质量、特定任务的基准上进行健康竞争的结果。经过十年的积极研究,零镜头学习(zsl)模型在ImageNet基准上的精度仍然很低,不适合实际的对象识别应用。在本文中,我们认为,这种明显缺乏进展的主要原因是基准的质量差。我们重点分析了目前基准的主要结构缺陷,并分析了影响zsl模型精度的各种因素。我们表明,现有的zsl模型的实际分类精度明显高于我们之前认为的,因为我们考虑了这些缺陷。然后,我们介绍了特定于zsl数据集的结构偏差的概念。我们讨论了这种新形式的偏差是如何允许标准基准的一个微不足道的解决方案的,并总结出对新基准的需求。然后,我们详细介绍了一个新基准的半自动构造,以解决这些缺陷。
URL
https://arxiv.org/abs/1904.04957