Paper Reading AI Learner

Salient Span Masking for Temporal Understanding

2023-03-22 18:49:43
Jeremy R. Cole, Aditi Chaudhary, Bhuwan Dhingra, Partha Talukdar

Abstract

Salient Span Masking (SSM) has shown itself to be an effective strategy to improve closed-book question answering performance. SSM extends general masked language model pretraining by creating additional unsupervised training sentences that mask a single entity or date span, thus oversampling factual information. Despite the success of this paradigm, the span types and sampling strategies are relatively arbitrary and not widely studied for other tasks. Thus, we investigate SSM from the perspective of temporal tasks, where learning a good representation of various temporal expressions is important. To that end, we introduce Temporal Span Masking (TSM) intermediate training. First, we find that SSM alone improves the downstream performance on three temporal tasks by an avg. +5.8 points. Further, we are able to achieve additional improvements (avg. +0.29 points) by adding the TSM task. These comprise the new best reported results on the targeted tasks. Our analysis suggests that the effectiveness of SSM stems from the sentences chosen in the training data rather than the mask choice: sentences with entities frequently also contain temporal expressions. Nonetheless, the additional targeted spans of TSM can still improve performance, especially in a zero-shot context.

Abstract (translated)

突出片段掩码(SSM)已经表明它是一种有效的策略,以提高闭包问答表现。SSM通过创建额外的未 unsupervised 训练句子来扩展通用掩码语言模型的预训练,这些句子掩盖了单个实体或日期范围,从而过度采样了事实信息。尽管这种方法取得了成功,但片段类型和采样策略相对任意,在其他方面的研究并不广泛。因此,我们从时间任务的角度研究了 SSM,其中学习各种时间表达方式非常重要。为此,我们引入了时间片段掩码(TSM)的中间训练。首先,我们发现,SSM 单独可以提高三个时间任务的平均向下性能5.8点。此外,通过添加 TSM 任务,我们可以实现额外的改善(平均增加0.29点)。这些组成了针对目标任务的最佳报告结果。我们的分析表明,SSM 的有效性来自于训练数据中选择的句子而不是掩码选择:句子中包含实体通常也包含时间表达方式。然而,尽管 TSM 额外的目标片段仍然可以改善性能,特别是在零经验上下文中。

URL

https://arxiv.org/abs/2303.12860

PDF

https://arxiv.org/pdf/2303.12860.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot