Paper Reading AI Learner

Large Language Models 'Ad Referendum': How Good Are They at Machine Translation in the Legal Domain?

2024-02-12 14:40:54
Vicent Briva-Iglesias, Joao Lucas Cavalheiro Camargo, Gokhan Dogru

Abstract

This study evaluates the machine translation (MT) quality of two state-of-the-art large language models (LLMs) against a tradition-al neural machine translation (NMT) system across four language pairs in the legal domain. It combines automatic evaluation met-rics (AEMs) and human evaluation (HE) by professional transla-tors to assess translation ranking, fluency and adequacy. The re-sults indicate that while Google Translate generally outperforms LLMs in AEMs, human evaluators rate LLMs, especially GPT-4, comparably or slightly better in terms of producing contextually adequate and fluent translations. This discrepancy suggests LLMs' potential in handling specialized legal terminology and context, highlighting the importance of human evaluation methods in assessing MT quality. The study underscores the evolving capabil-ities of LLMs in specialized domains and calls for reevaluation of traditional AEMs to better capture the nuances of LLM-generated translations.

Abstract (translated)

本研究评估了两种最先进的语言模型(LLMs)在法律领域中相对于传统神经机器翻译(NMT)系统的机器翻译(MT)质量。它通过将自动评估指标(AEMs)和职业翻译员的 human evaluation(HE)相结合来评估翻译排名、流畅度和充分性。研究结果表明,尽管 Google Translate 在 AEMs 中通常表现出色,但职业翻译员对 LLMs 的评估相对较好,尤其是 GPT-4,在产生上下文充分且流畅的翻译方面。这种差异表明 LLMs 在处理专业法律术语和上下文方面的潜在能力,突出了在评估 MT 质量中使用人类评估方法的重要性。该研究强调了 LLMs 在专业领域不断发展的能力,并呼吁重新评估传统的 AEMs,以更好地捕捉 LLM 生成的翻译的细微差别。

URL

https://arxiv.org/abs/2402.07681

PDF

https://arxiv.org/pdf/2402.07681.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot