Paper Reading AI Learner

Story Ending Prediction by Transferable BERT

2019-05-21 02:37:11
Zhongyang Li, Xiao Ding, Ting Liu

Abstract

Recent advances, such as GPT and BERT, have shown success in incorporating a pre-trained transformer language model and fine-tuning operation to improve downstream NLP systems. However, this framework still has some fundamental problems in effectively incorporating supervised knowledge from other related tasks. In this study, we investigate a transferable BERT (TransBERT) training framework, which can transfer not only general language knowledge from large-scale unlabeled data but also specific kinds of knowledge from various semantically related supervised tasks, for a target task. Particularly, we propose utilizing three kinds of transfer tasks, including natural language inference, sentiment classification, and next action prediction, to further train BERT based on a pre-trained model. This enables the model to get a better initialization for the target task. We take story ending prediction as the target task to conduct experiments. The final result, an accuracy of 91.8%, dramatically outperforms previous state-of-the-art baseline methods. Several comparative experiments give some helpful suggestions on how to select transfer tasks. Error analysis shows what are the strength and weakness of BERT-based models for story ending prediction.

Abstract (translated)

最近的进展,如gpt和bert,已经显示出成功地结合了一个预先培训的变压器语言模型和微调操作,以改善下游NLP系统。然而,该框架在有效整合来自其他相关任务的监控知识方面仍然存在一些基本问题。在本研究中,我们研究了一个可转换的伯特(transbert)训练框架,它不仅可以从大规模的未标记数据中传递一般语言知识,而且可以从各种语义相关的监督任务中传递特定种类的知识。特别地,我们建议利用三种转移任务,包括自然语言推理、情感分类和下一步行动预测,进一步训练基于预先训练模型的伯特。这使模型能够更好地初始化目标任务。以故事结局预测为目标任务进行实验。最终的结果,准确率为91.8%,大大超过了以前最先进的基线方法。几个比较实验对如何选择转移任务提出了一些有益的建议。误差分析表明,基于伯特的故事结局预测模型的优缺点是什么。

URL

https://arxiv.org/abs/1905.07504

PDF

https://arxiv.org/pdf/1905.07504.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot