Abstract
Neural Machine Translation (NMT) systems are known to degrade when confronted with noisy data, especially when the system is trained only on clean data. In this paper, we show that augmenting training data with sentences containing artificially-introduced grammatical errors can make the system more robust to such errors. In combination with an automatic grammar error correction system, we can recover 1.5 BLEU out of 2.4 BLEU lost due to grammatical errors. We also present a set of Spanish translations of the JFLEG grammar error correction corpus, which allows for testing NMT robustness to real grammatical errors.
Abstract (translated)
已知神经机器翻译(NMT)系统在遇到噪声数据时会降级,特别是当系统仅在干净的数据上训练时。在本文中,我们表明用包含人为引入的语法错误的句子来增加训练数据可以使系统对这些错误更加健壮。结合自动语法纠错系统,我们可以从2.4 BLEU中恢复1.5 BLEU,因为语法错误。我们还提供了一套JFLEG语法纠错语料库的西班牙语翻译,允许测试NMT对真实语法错误的鲁棒性。
URL
https://arxiv.org/abs/1808.06267