Paper Reading AI Learner

Otem&Utem: Over- and Under-Translation Evaluation Metric for NMT

2018-07-24 08:09:22
Jing Yang, Biao Zhang, Yue Qin, Xiangwen Zhang, Qian Lin, Jinsong Su

Abstract

Although neural machine translation(NMT) yields promising translation performance, it unfortunately suffers from over- and under-translation is- sues [Tu et al., 2016], of which studies have become research hotspots in NMT. At present, these studies mainly apply the dominant automatic evaluation metrics, such as BLEU, to evaluate the overall translation quality with respect to both adequacy and uency. However, they are unable to accurately measure the ability of NMT systems in dealing with the above-mentioned issues. In this paper, we propose two quantitative metrics, the Otem and Utem, to automatically evaluate the system perfor- mance in terms of over- and under-translation respectively. Both metrics are based on the proportion of mismatched n-grams between gold ref- erence and system translation. We evaluate both metrics by comparing their scores with human evaluations, where the values of Pearson Cor- relation Coefficient reveal their strong correlation. Moreover, in-depth analyses on various translation systems indicate some inconsistency be- tween BLEU and our proposed metrics, highlighting the necessity and significance of our metrics.

Abstract (translated)

尽管神经机器翻译(NMT)产生了有希望的翻译性能,但遗憾的是它存在过度翻译和翻译不足的问题[Tu et al。,2016],其研究已成为NMT的研究热点。目前,这些研究主要应用主导的自动评估指标,如BLEU,来评估整体翻译质量的充分性和有效性。但是,他们无法准确衡量NMT系统处理上述问题的能力。在本文中,我们提出了两个量化指标,Otem和Utem,分别自动评估系统在翻译和翻译过程中的表现。这两个指标都是基于黄金参考和系统翻译之间不匹配的n-gram的比例。我们通过将他们的得分与人类评估进行比较来评估这两个指标,其中Pearson相关系数的值揭示了它们的强相关性。此外,对各种翻译系统的深入分析表明,BLEU与我们提出的指标之间存在一些不一致性,突出了我们指标的必要性和重要性。

URL

https://arxiv.org/abs/1807.08945

PDF

https://arxiv.org/pdf/1807.08945.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot