Paper Reading AI Learner

What about em? How Commercial Machine Translation Fails to Handle Pronouns

2023-05-25 13:34:09
Anne Lauscher, Debora Nozza, Archie Crowley, Ehm Miltersen, Dirk Hovy

Abstract

As 3rd-person pronoun usage shifts to include novel forms, e.g., neopronouns, we need more research on identity-inclusive NLP. Exclusion is particularly harmful in one of the most popular NLP applications, machine translation (MT). Wrong pronoun translations can discriminate against marginalized groups, e.g., non-binary individuals (Dev et al., 2021). In this ``reality check'', we study how three commercial MT systems translate 3rd-person pronouns. Concretely, we compare the translations of gendered vs. gender-neutral pronouns from English to five other languages (Danish, Farsi, French, German, Italian), and vice versa, from Danish to English. Our error analysis shows that the presence of a gender-neutral pronoun often leads to grammatical and semantic translation errors. Similarly, gender neutrality is often not preserved. By surveying the opinions of affected native speakers from diverse languages, we provide recommendations to address the issue in future MT research.

Abstract (translated)

随着使用第三人称代词的方式转向包括新型形式,例如新代词,我们需要更多的研究来探讨身份包容的NLP研究。排斥在最受欢迎的NLP应用之一,机器翻译(MT)中尤其有害。错误的代词翻译可以歧视边缘化群体,例如非二元性别个体(Dev等人,2021)。在这部“现实检查”中,我们研究三个商业MT系统如何翻译第三人称代词。具体而言,我们比较了从英语到五个其他语言(丹麦、波斯语、法语、德语和意大利语)的性别形容词和性别中性代词的翻译,以及从丹麦到英语的翻译,反之亦然。我们的错误分析表明,存在性别中性代词往往导致语法和语义翻译错误。类似地,性别中立往往不被保留。通过调查来自不同语言受影响的本地母语者的意见和建议,我们提供了解决未来MT研究这个问题的建议。

URL

https://arxiv.org/abs/2305.16051

PDF

https://arxiv.org/pdf/2305.16051.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot