Abstract
Unsupervised Neural Machine Translation (UNMT) focuses on improving NMT results under the assumption there is no human translated parallel data, yet little work has been done so far in highlighting its advantages compared to supervised methods and analyzing its output in aspects other than translation accuracy. We focus on three very diverse languages, French, Gujarati, and Kazakh, and train bilingual NMT models, to and from English, with various levels of supervision, in high- and low- resource setups, measure quality of the NMT output and compare the generated sequences' word order and semantic similarity to source and reference sentences. We also use Layer-wise Relevance Propagation to evaluate the source and target sentences' contribution to the result, expanding the findings of previous works to the UNMT paradigm.
Abstract (translated)
无监督神经机器翻译(UNMT)专注于在不存在人类翻译平行数据的情况下改善NMT结果,然而迄今为止,在强调其与监督方法相比的优势以及分析除了翻译准确性之外的其他方面的工作还很少。我们关注三种非常不同的语言,法语、古吉拉特语和哈萨克斯坦语,在具有各种监督级别的高和低资源设置中,训练双语NMT模型,从英语到各种程度,衡量NMT输出的质量并比较生成序列的词序和语义与源句和参考句子的相似性。我们还使用层间相关传播来评估源句和目标句子的贡献对结果的影响,扩展了之前关于UNMT范式的发现。
URL
https://arxiv.org/abs/2312.12588