Abstract
A number of labeling systems based on text have been proposed to help monitor work on the United Nations (UN) Sustainable Development Goals (SDGs). Here, we present a systematic comparison of systems using a variety of text sources and show that systems differ considerably in their specificity (i.e., true-positive rate) and sensitivity (i.e., true-negative rate), have systematic biases (e.g., are more sensitive to specific SDGs relative to others), and are susceptible to the type and amount of text analyzed. We then show that an ensemble model that pools labeling systems alleviates some of these limitations, exceeding the labeling performance of all currently available systems. We conclude that researchers and policymakers should care about the choice of labeling system and that ensemble methods should be favored when drawing conclusions about the absolute and relative prevalence of work on the SDGs based on automated methods.
Abstract (translated)
一些基于文本的标注系统被提出来帮助监测联合国(UN)可持续发展目标(SDG)的工作。在这里,我们使用多种文本来源进行系统比较,并表明系统在特异性(即真阳性率)和敏感性(即真阴性率)方面存在较大差异,具有系统偏差(例如,对特定的SDG比其他SDG更敏感),并容易受到分析文本的类型和数量的影响。随后,我们展示了一种集成标注系统的模型,可以减轻这些限制,超过所有目前可用的标注性能。我们得出结论,研究人员和决策者应该关注标注系统的选择,并在基于自动化方法得出SDG工作的绝对和相对普及程度时倾向于集成方法。
URL
https://arxiv.org/abs/2301.11353