Abstract
One approach for multilingual data-to-text generation is to translate grammatical configurations upfront from the source language into each target language. These configurations are then used by a surface realizer and in document planning stages to generate output. In this paper, we describe a rule-based NLG implementation of this approach where the configuration is translated by Neural Machine Translation (NMT) combined with a one-time human review, and introduce a cross-language grammar dependency model to create a multilingual NLG system that generates text from the source data, scaling the generation phase without a human in the loop. Additionally, we introduce a method for human post-editing evaluation on the automatically translated text. Our evaluation on the SportSett:Basketball dataset shows that our NLG system performs well, underlining its grammatical correctness in translation tasks.
Abstract (translated)
一种多语言数据到文本生成的方法是提前将源语言的语法结构翻译成每个目标语言。这些配置随后由表层实现器和文档规划阶段使用,以生成输出内容。在本文中,我们描述了一种基于规则的NLG(自然语言生成)实现方法,在该方法中通过神经机器翻译(NMT)结合一次性人工审查来翻译配置,并引入跨语言语法依赖模型,以此创建一个多语言NLG系统,可从源数据生成文本,并且无需人工干预即可扩展生成阶段。此外,我们还介绍了一种对自动翻译后的文本进行人工后期编辑评估的方法。在SportSett:Basketball数据集上的评估表明,我们的NLG系统表现良好,突出了其在翻译任务中的语法正确性。
URL
https://arxiv.org/abs/2501.16135