Combining Advanced Methods in Japanese-Vietnamese Neural Machine Translation

2018-05-18 10:36:37

Thi-Vinh Ngo, Thanh-Le Ha, Phuong-Thai Nguyen, Le-Minh Nguyen

arXiv_CL

Abstract
Abstract (translated)
URL
PDF

Abstract

Neural machine translation (NMT) systems have recently obtained state-of-the art in many machine translation systems between popular language pairs because of the availability of data. For low-resourced language pairs, there are few researches in this field due to the lack of bilingual data. In this paper, we attempt to build the first NMT systems for a low-resourced language pairs:Japanese-Vietnamese. We have also shown significant improvements when combining advanced methods to reduce the adverse impacts of data sparsity and improve the quality of NMT systems. In addition, we proposed a variant of Byte-Pair Encoding algorithm to perform effective word segmentation for Vietnamese texts and alleviate the rare-word problem that persists in NMT systems.

Abstract (translated)

由于数据的可用性，神经机器翻译（NMT）系统最近在流行语言对之间的许多机器翻译系统中获得了最新技术。对于资源匮乏的语言对，由于缺乏双语数据，因此该领域的研究很少。在本文中，我们试图建立第一个NMT系统，用于资源贫乏的语言对：日本 - 越南语。当结合先进的方法来减少数据稀疏性的不利影响并提高NMT系统的质量时，我们也显示出显着的改进。此外，我们还提出了一种字节对编码算法的变体，用于对越南文文本进行有效的分词，并缓解NMT系统中存在的罕见字问题。

URL

https://arxiv.org/abs/1805.07133

PDF

https://arxiv.org/pdf/1805.07133.pdf