Misleading Failures of Partial-input Baselines

2019-06-10 02:39:20

Shi Feng, Eric Wallace, Jordan Boyd-Graber

arXiv_CV

Abstract
Abstract (translated)
URL
PDF

Abstract

Recent work establishes dataset difficulty and removes annotation artifacts via partial-input baselines (e.g., hypothesis-only models for SNLI or question-only models for VQA). When a partial-input baseline gets high accuracy, a dataset is cheatable. However, the converse is not necessarily true: the failure of a partial-input baseline does not mean a dataset is free of artifacts. To illustrate this, we first design artificial datasets which contain trivial patterns in the full input that are undetectable by any partial-input model. Next, we identify such artifacts in the SNLI dataset - a hypothesis-only model augmented with trivial patterns in the premise can solve 15% of the examples that are previously considered "hard". Our work provides a caveat for the use of partial-input baselines for dataset verification and creation.

Abstract (translated)

最近的工作建立了数据集的难度，并通过部分输入基线（例如，SNLI的仅假设模型或VQA的仅问题模型）删除了注释工件。当部分输入基线获得高精度时，数据集是可欺骗的。然而，反过来也不一定是正确的：部分输入基线的失败并不意味着数据集没有伪影。为了说明这一点，我们首先设计了人工数据集，其中包含所有部分输入模型都无法检测到的完整输入中的琐碎模式。接下来，我们在snli数据集中识别出这样的伪影——一个假设模型在前提中加入了琐碎的模式，可以解决之前被认为“困难”的15%的例子。我们的工作为使用部分输入基线进行数据集验证和创建提供了警告。

URL

https://arxiv.org/abs/1905.05778

PDF

https://arxiv.org/pdf/1905.05778.pdf