What about em? How Commercial Machine Translation Fails to Handle Pronouns

2023-05-25 13:34:09

Anne Lauscher, Debora Nozza, Archie Crowley, Ehm Miltersen, Dirk Hovy

arXiv_AI

arXiv_AI Survey Recommendation

Abstract
Abstract (translated)
URL
PDF

Abstract

As 3rd-person pronoun usage shifts to include novel forms, e.g., neopronouns, we need more research on identity-inclusive NLP. Exclusion is particularly harmful in one of the most popular NLP applications, machine translation (MT). Wrong pronoun translations can discriminate against marginalized groups, e.g., non-binary individuals (Dev et al., 2021). In this ``reality check'', we study how three commercial MT systems translate 3rd-person pronouns. Concretely, we compare the translations of gendered vs. gender-neutral pronouns from English to five other languages (Danish, Farsi, French, German, Italian), and vice versa, from Danish to English. Our error analysis shows that the presence of a gender-neutral pronoun often leads to grammatical and semantic translation errors. Similarly, gender neutrality is often not preserved. By surveying the opinions of affected native speakers from diverse languages, we provide recommendations to address the issue in future MT research.

Abstract (translated)

随着使用第三人称代词的方式转向包括新型形式，例如新代词，我们需要更多的研究来探讨身份包容的NLP研究。排斥在最受欢迎的NLP应用之一，机器翻译(MT)中尤其有害。错误的代词翻译可以歧视边缘化群体，例如非二元性别个体(Dev等人，2021)。在这部“现实检查”中，我们研究三个商业MT系统如何翻译第三人称代词。具体而言，我们比较了从英语到五个其他语言(丹麦、波斯语、法语、德语和意大利语)的性别形容词和性别中性代词的翻译，以及从丹麦到英语的翻译，反之亦然。我们的错误分析表明，存在性别中性代词往往导致语法和语义翻译错误。类似地，性别中立往往不被保留。通过调查来自不同语言受影响的本地母语者的意见和建议，我们提供了解决未来MT研究这个问题的建议。

URL

https://arxiv.org/abs/2305.16051

PDF

https://arxiv.org/pdf/2305.16051.pdf