Abstract
Due to the rapid development of text generation models, people increasingly often encounter texts that may start out as written by a human but then continue as machine-generated results of large language models. Detecting the boundary between human-written and machine-generated parts of such texts is a very challenging problem that has not received much attention in literature. In this work, we consider and compare a number of different approaches for this artificial text boundary detection problem, comparing several predictors over features of different nature. We show that supervised fine-tuning of the RoBERTa model works well for this task in general but fails to generalize in important cross-domain and cross-generator settings, demonstrating a tendency to overfit to spurious properties of the data. Then, we propose novel approaches based on features extracted from a frozen language model's embeddings that are able to outperform both the human accuracy level and previously considered baselines on the Real or Fake Text benchmark. Moreover, we adapt perplexity-based approaches for the boundary detection task and analyze their behaviour. We analyze the robustness of all proposed classifiers in cross-domain and cross-model settings, discovering important properties of the data that can negatively influence the performance of artificial text boundary detection algorithms.
Abstract (translated)
由于自然语言生成模型的快速发展,人们越来越多地遇到可能最初由人类撰写,然后继续由大型语言模型生成的大规模语言模型的文本。检测这种文本中人类撰写的和机器生成的部分边界是一个非常有挑战性的问题,在文献中受到了很少的关注。在这项工作中,我们考虑并比较了多种不同的方法来解决这个人工文本边界检测问题,在不同类型的特征上进行了比较。我们发现,监督微调的RoBERTa模型在一般情况下对此任务表现良好,但在重要的跨领域和跨生成设置中表现不佳,表明对数据中伪特征的过度拟合。然后,我们提出了一种基于从冻语言模型嵌入中提取的特征的新型方法,能够超越人类准确水平,并显著地改善之前考虑的基线。此外,我们还对边界检测任务进行了基于干扰项的适应性分析,并分析了其行为。我们分析了一切提出的分类器在跨领域和跨模型设置中的鲁棒性,发现了可能对人工文本边界检测算法性能产生负面影响的重要数据属性。
URL
https://arxiv.org/abs/2311.08349