Abstract
Salient Span Masking (SSM) has shown itself to be an effective strategy to improve closed-book question answering performance. SSM extends general masked language model pretraining by creating additional unsupervised training sentences that mask a single entity or date span, thus oversampling factual information. Despite the success of this paradigm, the span types and sampling strategies are relatively arbitrary and not widely studied for other tasks. Thus, we investigate SSM from the perspective of temporal tasks, where learning a good representation of various temporal expressions is important. To that end, we introduce Temporal Span Masking (TSM) intermediate training. First, we find that SSM alone improves the downstream performance on three temporal tasks by an avg. +5.8 points. Further, we are able to achieve additional improvements (avg. +0.29 points) by adding the TSM task. These comprise the new best reported results on the targeted tasks. Our analysis suggests that the effectiveness of SSM stems from the sentences chosen in the training data rather than the mask choice: sentences with entities frequently also contain temporal expressions. Nonetheless, the additional targeted spans of TSM can still improve performance, especially in a zero-shot context.
Abstract (translated)
突出片段掩码(SSM)已经表明它是一种有效的策略,以提高闭包问答表现。SSM通过创建额外的未 unsupervised 训练句子来扩展通用掩码语言模型的预训练,这些句子掩盖了单个实体或日期范围,从而过度采样了事实信息。尽管这种方法取得了成功,但片段类型和采样策略相对任意,在其他方面的研究并不广泛。因此,我们从时间任务的角度研究了 SSM,其中学习各种时间表达方式非常重要。为此,我们引入了时间片段掩码(TSM)的中间训练。首先,我们发现,SSM 单独可以提高三个时间任务的平均向下性能5.8点。此外,通过添加 TSM 任务,我们可以实现额外的改善(平均增加0.29点)。这些组成了针对目标任务的最佳报告结果。我们的分析表明,SSM 的有效性来自于训练数据中选择的句子而不是掩码选择:句子中包含实体通常也包含时间表达方式。然而,尽管 TSM 额外的目标片段仍然可以改善性能,特别是在零经验上下文中。
URL
https://arxiv.org/abs/2303.12860