Abstract
We propose a voting-driven semi-supervised approach to automatically acquire the typical duration of an event and use it as pseudo-labeled data. The human evaluation demonstrates that our pseudo labels exhibit surprisingly high accuracy and balanced coverage. In the temporal commonsense QA task, experimental results show that using only pseudo examples of 400 events, we achieve performance comparable to the existing BERT-based weakly supervised approaches that require a significant amount of training examples. When compared to the RoBERTa baselines, our best approach establishes state-of-the-art performance with a 7% improvement in Exact Match.
Abstract (translated)
我们提出了一种投票驱动的半监督方法,用于自动获取事件的典型持续时间,并将其用作伪标签数据。人类评估表明,我们的伪标签具有令人惊讶的准确性和平衡覆盖。在时间常识问题问答任务中,实验结果表明,仅使用400个事件的伪例子,我们在实现与现有基于BERT的弱监督方法相当的表现方面取得了成功。与RoBERTa基线相比,我们最佳的方法通过提高Exact Match的7%的性能,实现了最先进的表现。
URL
https://arxiv.org/abs/2403.18504