Paper Reading AI Learner

Focusing More on Conflicts with Mis-Predictions Helps Language Pre-Training

2020-12-16 08:21:51
Chen Xing, Wencong Xiao, Yong Li, Wei Lin

Abstract

In this work, we propose to improve the effectiveness of language pre-training methods with the help of mis-predictions during pre-training. Neglecting words in the input sentence that have conflicting semantics with mis-predictions is likely to be the reason of generating mis-predictions at pre-training. Therefore, we hypothesis that mis-predictions during pre-training can act as detectors of the ill focuses of the model. If we train the model to focus more on the conflicts with the mis-predictions while focus less on the rest words in the input sentence, the mis-predictions can be more easily corrected and the entire model could be better trained. Towards this end, we introduce Focusing Less on Context of Mis-predictions(McMisP). In McMisP, we record the co-occurrence information between words to detect the conflicting words with mis-predictions in an unsupervised way. Then McMisP uses such information to guide the attention modules when a mis-prediction occurs. Specifically, several attention modules in the Transformer are optimized to focus more on words in the input sentence that have co-occurred rarely with the mis-predictions and vice versa. Results show that McMisP significantly expedites BERT and ELECTRA and improves their performances on downstream tasks.

Abstract (translated)

URL

https://arxiv.org/abs/2012.08789

PDF

https://arxiv.org/pdf/2012.08789.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot