Abstract
Recent advances in generative modeling of text have demonstrated remarkable improvements in terms of fluency and coherency. In this work we investigate to which extent a machine can discriminate real from machine generated text. This is important in itself for automatic detection of computer generated stories, but can also serve as a tool for further improving text generation. We show that learning a dedicated scoring function to discriminate between real and fake text achieves higher precision than employing the likelihood of a generative model. The scoring functions generalize to other generators than those used for training as long as these generators have comparable model complexity and are trained on similar datasets.
Abstract (translated)
文本生成建模的最新进展在流畅性和连贯性方面显示出显著的改进。在这项工作中,我们调查机器在多大程度上可以区分真实的机器生成的文本。这本身对于计算机生成的故事的自动检测很重要,但也可以作为进一步改进文本生成的工具。研究表明,学习一个专用的评分函数来区分真假文本比使用生成模型的可能性更精确。只要这些生成器具有可比的模型复杂性,并且在类似的数据集上进行训练,评分函数就可以推广到培训用生成器以外的其他生成器。
URL
https://arxiv.org/abs/1906.03351