Abstract
Training accurate classifiers requires many labels, but each label provides only limited information (one bit for binary classification). In this work, we propose BabbleLabble, a framework for training classifiers in which an annotator provides a natural language explanation for each labeling decision. A semantic parser converts these explanations into programmatic labeling functions that generate noisy labels for an arbitrary amount of unlabeled data, which is used to train a classifier. On three relation extraction tasks, we find that users are able to train classifiers with comparable F1 scores from 5-100$\times$ faster by providing explanations instead of just labels. Furthermore, given the inherent imperfection of labeling functions, we find that a simple rule-based semantic parser suffices.
Abstract (translated)
训练准确的分类器需要许多标签,但每个标签仅提供有限的信息(一位用于二进制分类)。在这项工作中,我们提出了BabbleLabble,一个用于训练分类器的框架,其中注释器为每个标签决策提供自然语言解释。语义解析器将这些解释转换为编程标记函数,这些函数为任意数量的未标记数据生成噪声标签,用于训练分类器。在三个关系提取任务中,我们发现用户能够通过提供解释而不仅仅是标签来训练具有可比较的F1分数的分类器,从5-100 $ \快$ $。此外,鉴于标签功能固有的缺陷,我们发现一个简单的基于规则的语义分析器就足够了。
URL
https://arxiv.org/abs/1805.03818