Abstract
Minimum Bayes Risk (MBR) decoding can significantly improve translation performance of Multilingual Large Language Models (MLLMs). However, MBR decoding is computationally expensive and in this paper, we show how recently developed Reinforcement Learning (RL) technique, Direct Preference Optimization (DPO) can be used to fine-tune MLLMs so that we get the gains from MBR without the additional computation in inference. Our fine-tuned models have significantly improved performance on multiple NMT test sets compared to base MLLMs without preference optimization. Our method boosts the translation performance of MLLMs using relatively small monolingual fine-tuning sets.
Abstract (translated)
最小贝叶斯风险(MBR)解码可以显著提高多语言大型语言模型的翻译性能。然而,MBR解码是计算密集型技术,在本文中,我们展示了如何使用最近开发的可用于精细调整MLLM的强化学习(RL)技术,直接偏好优化(DPO)来实现,以在不增加推理计算的情况下获得MBR的收益。我们对基MLLM进行了微调,在多个NMT测试集上的性能已经显著超过了没有偏优化时的基MLLM。我们的方法通过相对较小的单语种微调集,显著提高了MLLM的翻译性能。
URL
https://arxiv.org/abs/2311.08380