POMP: Probability-driven Meta-graph Prompter for LLMs in Low-resource Unsupervised Neural Machine Translation

Abstract
Abstract (translated)
URL
PDF

Abstract

Low-resource languages (LRLs) face challenges in supervised neural machine translation due to limited parallel data, prompting research into unsupervised methods. Unsupervised neural machine translation (UNMT) methods, including back-translation, transfer learning, and pivot-based translation, offer practical solutions for LRL translation, but they are hindered by issues like synthetic data noise, language bias, and error propagation, which can potentially be mitigated by Large Language Models (LLMs). LLMs have advanced NMT with in-context learning (ICL) and supervised fine-tuning methods, but insufficient training data results in poor performance in LRLs. We argue that LLMs can mitigate the linguistic noise with auxiliary languages to improve translations in LRLs. In this paper, we propose Probability-driven Meta-graph Prompter (POMP), a novel approach employing a dynamic, sampling-based graph of multiple auxiliary languages to enhance LLMs' translation capabilities for LRLs. POMP involves constructing a directed acyclic meta-graph for each source language, from which we dynamically sample multiple paths to prompt LLMs to mitigate the linguistic noise and improve translations during training. We use the BLEURT metric to evaluate the translations and back-propagate rewards, estimated by scores, to update the probabilities of auxiliary languages in the paths. Our experiments show significant improvements in the translation quality of three LRLs, demonstrating the effectiveness of our approach.

Abstract (translated)

低资源语言（LRLs）在监督式神经机器翻译中面临挑战，因为缺乏并行数据，促使研究转向无监督方法。无监督神经机器翻译（UNMT）方法，包括反向翻译、迁移学习和基于枢纽的翻译，为LRL翻译提供了实际解决方案，但它们受到诸如合成数据噪声、语言偏见和错误传播等问题困扰，这些问题可能通过大型语言模型（LLMs）进行缓解。LLMs具有先进的NMT与上下文学习（ICL）以及监督微调方法，但足够的训练数据导致在LRL上表现不佳。我们认为，LLMs可以通过辅助语言减小语言噪声，提高LRL的翻译质量。在本文中，我们提出了概率驱动元图提示（POMP）方法，一种新方法，它利用多个辅助语言的动态、抽样式图来增强LLMs的翻译能力。POMP包括为每个源语言构建一个指向有向无环图（DAG），然后从该图中动态抽样多个路径，提示LLMs减少语言噪声并提高训练过程中的翻译质量。我们使用BLEURT度量来评估翻译和反向传播的奖励，根据分数估算，更新路径上的辅助语言的概率。我们的实验结果表明，三种LRL的翻译质量都得到了显著提高，证明了我们的方法的有效性。

URL

https://arxiv.org/abs/2401.05596

PDF

https://arxiv.org/pdf/2401.05596.pdf

POMP: Probability-driven Meta-graph Prompter for LLMs in Low-resource Unsupervised Neural Machine Translation

Abstract

Abstract (translated)

URL

PDF Copy

PDF