Abstract
The growing popularity of neural machine translation (NMT) and LLMs represented by ChatGPT underscores the need for a deeper understanding of their distinct characteristics and relationships. Such understanding is crucial for language professionals and researchers to make informed decisions and tactful use of these cutting-edge translation technology, but remains underexplored. This study aims to fill this gap by investigating three key questions: (1) the distinguishability of ChatGPT-generated translations from NMT and human translation (HT), (2) the linguistic characteristics of each translation type, and (3) the degree of resemblance between ChatGPT-produced translations and HT or NMT. To achieve these objectives, we employ statistical testing, machine learning algorithms, and multidimensional analysis (MDA) to analyze Spokesperson's Remarks and their translations. After extracting a wide range of linguistic features, supervised classifiers demonstrate high accuracy in distinguishing the three translation types, whereas unsupervised clustering techniques do not yield satisfactory results. Another major finding is that ChatGPT-produced translations exhibit greater similarity with NMT than HT in most MDA dimensions, which is further corroborated by distance computing and visualization. These novel insights shed light on the interrelationships among the three translation types and have implications for the future advancements of NMT and generative AI.
Abstract (translated)
随着神经机器翻译(NMT)和大型语言模型(LLM)如ChatGPT的日益普及,需要对它们独特的特征和关系进行更深刻的理解。然而,这种理解仍然被忽视。本研究旨在填补这一空白,通过调查三个关键问题:(1)ChatGPT生成的翻译与NMT和人类翻译(HT)的区分性,(2)每种翻译类型的语言特征,(3)ChatGPT生成的翻译与HT或NMT之间的相似程度。为了实现这些目标,我们采用统计测试、机器学习算法和多维度分析(MDA)分析Spokesperson的讲话及其翻译。在提取了广泛的语料特征之后,有监督分类器在区分三种翻译类型方面表现出高度准确,而无需监督聚类技术的结果并不令人满意。另一个重要发现是,ChatGPT生成的翻译在大多数MDA维度上与NMT的相似性要大于HT,这一发现通过距离计算和可视化得到了进一步证实。这些新的见解揭示了三种翻译类型之间的相互关系,对NMT和生成人工智能的未来发展具有启示意义。
URL
https://arxiv.org/abs/2312.10750