Abstract
Recent work establishes dataset difficulty and removes annotation artifacts via partial-input baselines (e.g., hypothesis-only models for SNLI or question-only models for VQA). When a partial-input baseline gets high accuracy, a dataset is cheatable. However, the converse is not necessarily true: the failure of a partial-input baseline does not mean a dataset is free of artifacts. To illustrate this, we first design artificial datasets which contain trivial patterns in the full input that are undetectable by any partial-input model. Next, we identify such artifacts in the SNLI dataset - a hypothesis-only model augmented with trivial patterns in the premise can solve 15% of the examples that are previously considered "hard". Our work provides a caveat for the use of partial-input baselines for dataset verification and creation.
Abstract (translated)
最近的工作建立了数据集的难度,并通过部分输入基线(例如,SNLI的仅假设模型或VQA的仅问题模型)删除了注释工件。当部分输入基线获得高精度时,数据集是可欺骗的。然而,反过来也不一定是正确的:部分输入基线的失败并不意味着数据集没有伪影。为了说明这一点,我们首先设计了人工数据集,其中包含所有部分输入模型都无法检测到的完整输入中的琐碎模式。接下来,我们在snli数据集中识别出这样的伪影——一个假设模型在前提中加入了琐碎的模式,可以解决之前被认为“困难”的15%的例子。我们的工作为使用部分输入基线进行数据集验证和创建提供了警告。
URL
https://arxiv.org/abs/1905.05778