Abstract
Recent advances, such as GPT and BERT, have shown success in incorporating a pre-trained transformer language model and fine-tuning operation to improve downstream NLP systems. However, this framework still has some fundamental problems in effectively incorporating supervised knowledge from other related tasks. In this study, we investigate a transferable BERT (TransBERT) training framework, which can transfer not only general language knowledge from large-scale unlabeled data but also specific kinds of knowledge from various semantically related supervised tasks, for a target task. Particularly, we propose utilizing three kinds of transfer tasks, including natural language inference, sentiment classification, and next action prediction, to further train BERT based on a pre-trained model. This enables the model to get a better initialization for the target task. We take story ending prediction as the target task to conduct experiments. The final result, an accuracy of 91.8%, dramatically outperforms previous state-of-the-art baseline methods. Several comparative experiments give some helpful suggestions on how to select transfer tasks. Error analysis shows what are the strength and weakness of BERT-based models for story ending prediction.
Abstract (translated)
最近的进展,如gpt和bert,已经显示出成功地结合了一个预先培训的变压器语言模型和微调操作,以改善下游NLP系统。然而,该框架在有效整合来自其他相关任务的监控知识方面仍然存在一些基本问题。在本研究中,我们研究了一个可转换的伯特(transbert)训练框架,它不仅可以从大规模的未标记数据中传递一般语言知识,而且可以从各种语义相关的监督任务中传递特定种类的知识。特别地,我们建议利用三种转移任务,包括自然语言推理、情感分类和下一步行动预测,进一步训练基于预先训练模型的伯特。这使模型能够更好地初始化目标任务。以故事结局预测为目标任务进行实验。最终的结果,准确率为91.8%,大大超过了以前最先进的基线方法。几个比较实验对如何选择转移任务提出了一些有益的建议。误差分析表明,基于伯特的故事结局预测模型的优缺点是什么。
URL
https://arxiv.org/abs/1905.07504