Paper Reading AI Learner

Distinguishing Translations by Human, NMT, and ChatGPT: A Linguistic and Statistical Approach

2023-12-17 15:56:05
Zhaokun Jiang, Qianxi Lv, Ziyin Zhang

Abstract

The growing popularity of neural machine translation (NMT) and LLMs represented by ChatGPT underscores the need for a deeper understanding of their distinct characteristics and relationships. Such understanding is crucial for language professionals and researchers to make informed decisions and tactful use of these cutting-edge translation technology, but remains underexplored. This study aims to fill this gap by investigating three key questions: (1) the distinguishability of ChatGPT-generated translations from NMT and human translation (HT), (2) the linguistic characteristics of each translation type, and (3) the degree of resemblance between ChatGPT-produced translations and HT or NMT. To achieve these objectives, we employ statistical testing, machine learning algorithms, and multidimensional analysis (MDA) to analyze Spokesperson's Remarks and their translations. After extracting a wide range of linguistic features, supervised classifiers demonstrate high accuracy in distinguishing the three translation types, whereas unsupervised clustering techniques do not yield satisfactory results. Another major finding is that ChatGPT-produced translations exhibit greater similarity with NMT than HT in most MDA dimensions, which is further corroborated by distance computing and visualization. These novel insights shed light on the interrelationships among the three translation types and have implications for the future advancements of NMT and generative AI.

Abstract (translated)

随着神经机器翻译(NMT)和大型语言模型(LLM)如ChatGPT的日益普及,需要对它们独特的特征和关系进行更深刻的理解。然而,这种理解仍然被忽视。本研究旨在填补这一空白,通过调查三个关键问题:(1)ChatGPT生成的翻译与NMT和人类翻译(HT)的区分性,(2)每种翻译类型的语言特征,(3)ChatGPT生成的翻译与HT或NMT之间的相似程度。为了实现这些目标,我们采用统计测试、机器学习算法和多维度分析(MDA)分析Spokesperson的讲话及其翻译。在提取了广泛的语料特征之后,有监督分类器在区分三种翻译类型方面表现出高度准确,而无需监督聚类技术的结果并不令人满意。另一个重要发现是,ChatGPT生成的翻译在大多数MDA维度上与NMT的相似性要大于HT,这一发现通过距离计算和可视化得到了进一步证实。这些新的见解揭示了三种翻译类型之间的相互关系,对NMT和生成人工智能的未来发展具有启示意义。

URL

https://arxiv.org/abs/2312.10750

PDF

https://arxiv.org/pdf/2312.10750.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot