Paper Reading AI Learner

Training Neural Machine Translation using Word Embedding-based Loss

2018-07-30 08:11:52
Katsuki Chousa, Katsuhito Sudoh, Satoshi Nakamura

Abstract

In neural machine translation (NMT), the computational cost at the output layer increases with the size of the target-side vocabulary. Using a limited-size vocabulary instead may cause a significant decrease in translation quality. This trade-off is derived from a softmax-based loss function that handles in-dictionary words independently, in which word similarity is not considered. In this paper, we propose a novel NMT loss function that includes word similarity in forms of distances in a word embedding space. The proposed loss function encourages an NMT decoder to generate words close to their references in the embedding space; this helps the decoder to choose similar acceptable words when the actual best candidates are not included in the vocabulary due to its size limitation. In experiments using ASPEC Japanese-to-English and IWSLT17 English-to-French data sets, the proposed method showed improvements against a standard NMT baseline in both datasets; especially with IWSLT17 En-Fr, it achieved up to +1.72 in BLEU and +1.99 in METEOR. When the target-side vocabulary was very limited to 1,000 words, the proposed method demonstrated a substantial gain, +1.72 in METEOR with ASPEC Ja-En.

Abstract (translated)

在神经机器翻译(NMT)中,输出层的计算成本随着目标侧词汇的大小而增加。使用有限大小的词汇表可能会导致翻译质量显着下降。这种权衡取决于基于softmax的损失函数,该函数独立地处理字典单词,其中不考虑单词相似性。在本文中,我们提出了一种新的NMT损失函数,其包括在字嵌入空间中的距离形式的单词相似性。所提出的损失函数鼓励NMT解码器在嵌入空间中生成接近其参考的字;这有助于解码器在实际最佳候选者由于其大小限制而未包括在词汇表中时选择类似的可接受单词。在使用ASPEC日语 - 英语和IWSLT17英语 - 法语数据集的实验中,所提出的方法显示了对两个数据集中的标准NMT基线的改进;特别是对于IWSLT17 En-Fr,它在BLEU中达到了+1.72,在METEOR中达到了1.1.99。当目标方词汇量非常限于1,000个单词时,所提出的方法在ASPE Ja-En的METEOR中获得了显着的增益,达到了1.72。

URL

https://arxiv.org/abs/1807.11219

PDF

https://arxiv.org/pdf/1807.11219.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot