Abstract
Current state-of-the-art NMT systems use large neural networks that are not only slow to train, but also often require many heuristics and optimization tricks, such as specialized learning rate schedules and large batch sizes. This is undesirable as it requires extensive hyperparameter tuning. In this paper, we propose a curriculum learning framework for NMT that reduces training time, reduces the need for specialized heuristics or large batch sizes, and results in overall better performance. Our framework consists of a principled way of deciding which training samples are shown to the model at different times during training, based on the estimated difficulty of a sample and the current competence of the model. Filtering training samples in this manner prevents the model from getting stuck in bad local optima, making it converge faster and reach a better solution than the common approach of uniformly sampling training examples. Furthermore, the proposed method can be easily applied to existing NMT models by simply modifying their input data pipelines. We show that our framework can help improve the training time and the performance of both recurrent neural network models and Transformers, achieving up to a 70% decrease in training time, while at the same time obtaining accuracy improvements of up to 2.2 BLEU.
Abstract (translated)
当前最先进的NMT系统使用的大型神经网络不仅训练速度慢,而且往往需要许多启发式和优化技巧,如专门的学习速度计划和大批量。这是不可取的,因为它需要广泛的超参数调整。本文提出了一种减少培训时间、减少专业启发式教学或大批量教学的课程学习框架,使教学总体效果更好。我们的框架包括一个原则性的方法,根据样本的估计难度和模型的当前能力,决定在培训期间不同时间向模型显示哪些培训样本。以这种方式过滤训练样本可以防止模型陷入糟糕的局部最优,使其收敛速度更快,并比一般的一致抽样训练样本方法获得更好的解决方案。此外,通过简单地修改现有的NMT模型的输入数据管道,该方法可以很容易地应用于现有的NMT模型。我们的研究表明,我们的框架可以帮助改善训练时间和重复性神经网络模型和变压器的性能,使训练时间减少70%,同时获得高达2.2 bleu的精度改进。
URL
https://arxiv.org/abs/1903.09848