Paper Reading AI Learner

Can ChatGPT Rival Neural Machine Translation? A Comparative Study

2024-01-10 14:20:33
Zhaokun Jiang, Ziyin Zhang

Abstract

Inspired by the increasing interest in leveraging large language models for translation, this paper evaluates the capabilities of large language models (LLMs) represented by ChatGPT in comparison to the mainstream neural machine translation (NMT) engines in translating Chinese diplomatic texts into English. Specifically, we examine the translation quality of ChatGPT and NMT engines as measured by four automated metrics and human evaluation based on an error-typology and six analytic rubrics. Our findings show that automated metrics yield similar results for ChatGPT under different prompts and NMT systems, while human annotators tend to assign noticeably higher scores to ChatGPT when it is provided an example or contextual information about the translation task. Pairwise correlation between automated metrics and dimensions of human evaluation produces weak and non-significant results, suggesting the divergence between the two methods of translation quality assessment. These findings provide valuable insights into the potential of ChatGPT as a capable machine translator, and the influence of prompt engineering on its performance.

Abstract (translated)

受到利用大型语言模型进行翻译 increasing 兴趣的启发,本文评估了 ChatGPT 这样的大型语言模型在将中文外交文本翻译成英语与主流神经机器翻译(NMT)引擎之间的能力。具体来说,我们研究了 ChatGPT 和 NMT 引擎的翻译质量,通过四种自动指标和基于错误类型和六种分析指标的人际评价进行评估。我们的发现表明,对于不同的提示和翻译系统,自动指标在 ChatGPT 上产生了类似的结果,而当提供关于翻译任务的例子或上下文信息时,人类评估者倾向于给 ChatGPT 分配更高的分数。自动指标和人类评估指标之间成对相关,结果为弱和非显著,表明两种翻译质量评估方法的差异。这些发现为 ChatGPT 作为具有潜力的机器翻译工具以及其性能受到提示工程影响的分析提供了宝贵的见解。

URL

https://arxiv.org/abs/2401.05176

PDF

https://arxiv.org/pdf/2401.05176.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot