Abstract
Recent studies have shown that reinforcement learning (RL) is an effective approach for improving the performance of neural machine translation (NMT) system. However, due to its instability, successfully RL training is challenging, especially in real-world systems where deep models and large datasets are leveraged. In this paper, taking several large-scale translation tasks as testbeds, we conduct a systematic study on how to train better NMT models using reinforcement learning. We provide a comprehensive comparison of several important factors (e.g., baseline reward, reward shaping) in RL training. Furthermore, to fill in the gap that it remains unclear whether RL is still beneficial when monolingual data is used, we propose a new method to leverage RL to further boost the performance of NMT systems trained with source/target monolingual data. By integrating all our findings, we obtain competitive results on WMT14 English- German, WMT17 English-Chinese, and WMT17 Chinese-English translation tasks, especially setting a state-of-the-art performance on WMT17 Chinese-English translation task.
Abstract (translated)
最近的研究表明,强化学习(RL)是提高神经机器翻译(NMT)系统性能的有效方法。然而,由于其不稳定性,成功的RL培训具有挑战性,特别是在利用深度模型和大型数据集的现实世界系统中。本文以几个大型翻译任务为测试平台,对如何利用强化学习训练更好的NMT模型进行系统研究。我们对RL培训中的几个重要因素(例如基线奖励,奖励塑造)进行了全面比较。此外,为了填补在使用单语数据时RL是否仍然有益的差距,我们提出了一种利用RL来进一步提高用源/目标单语数据训练的NMT系统的性能的新方法。通过整合我们的所有发现,我们获得了WMT14英语 - 德语,WMT17英语 - 汉语和WMT17汉英翻译任务的竞争结果,特别是在WMT17汉英翻译任务中设置了最先进的表现。
URL
https://arxiv.org/abs/1808.08866