Abstract
Ancient Chinese brings the wisdom and spirit culture of the Chinese nation. Automatically translation from ancient Chinese to modern Chinese helps to inherit and carry forward the quintessence of the ancients. In this paper, we propose an Ancient-Modern Chinese clause alignment approach and apply it to create a large scale Ancient-Modern Chinese parallel corpus which contains about 1.24M bilingual pairs. To our best knowledge, this is the first large high-quality Ancient-Modern Chinese dataset. Furthermore, we train the SMT and various NMT based models on this dataset and provide a strong baseline for this task
Abstract (translated)
中国古代带来了中华民族的智慧和精神文化。从古代汉语到现代汉语的自动翻译有助于继承和发扬古人的精髓。在本文中,我们提出了一种古代 - 现代汉语对齐方法,并将其应用于创建一个包含大约1.24M双语对的大规模古代中国平行语料库。据我们所知,这是第一个大型高质量的古代 - 现代汉语数据集。此外,我们在该数据集上训练SMT和各种基于NMT的模型,并为此任务提供强大的基线
URL
https://arxiv.org/abs/1808.03738