Abstract
Despite achieving remarkable performance, machine translation (MT) research remains underexplored in terms of translating cultural elements in languages, such as idioms, proverbs, and colloquial expressions. This paper investigates the capability of state-of-the-art neural machine translation (NMT) and large language models (LLMs) in translating proverbs, which are deeply rooted in cultural contexts. We construct a translation dataset of standalone proverbs and proverbs in conversation for four language pairs. Our experiments show that the studied models can achieve good translation between languages with similar cultural backgrounds, and LLMs generally outperform NMT models in proverb translation. Furthermore, we find that current automatic evaluation metrics such as BLEU, CHRF++ and COMET are inadequate for reliably assessing the quality of proverb translation, highlighting the need for more culturally aware evaluation metrics.
Abstract (translated)
尽管机器翻译(MT)在性能上取得了显著成就,但在翻译语言中的文化元素方面仍存在不足,例如成语、谚语和口语表达。本文研究了最先进的神经机器翻译(NMT)和大型语言模型(LLMs)在翻译谚语方面的能力,这些谚语深深植根于特定的文化背景之中。我们为四种语言对构建了一个独立谚语和会话语境中的谚语的翻译数据集。我们的实验表明,对于文化背景相似的语言而言,所研究的模型能够实现较好的翻译效果,并且大型语言模型在谚语翻译中通常优于神经机器翻译模型。此外,我们发现现有的自动评估指标(如BLEU、CHRF++和COMET)不足以可靠地衡量谚语翻译的质量,这突显了需要更多具有文化意识的评估标准的重要性。
URL
https://arxiv.org/abs/2501.11953