Abstract
The efficacy of an large language model (LLM) generated text detector depends substantially on the availability of sizable training data. White-box zero-shot detectors, which require no such data, are nonetheless limited by the accessibility of the source model of the LLM-generated text. In this paper, we propose an simple but effective black-box zero-shot detection approach, predicated on the observation that human-written texts typically contain more grammatical errors than LLM-generated texts. This approach entails computing the Grammar Error Correction Score (GECScore) for the given text to distinguish between human-written and LLM-generated text. Extensive experimental results show that our method outperforms current state-of-the-art (SOTA) zero-shot and supervised methods, achieving an average AUROC of 98.7% and showing strong robustness against paraphrase and adversarial perturbation attacks.
Abstract (translated)
大型语言模型(LLM)生成文本检测器的有效性很大程度上取决于可用的训练数据。白盒零击检测器,无需这种数据,仍然受到其源模型(LLM生成的文本)的可访问性的限制。在本文中,我们提出了一种简单但有效的黑盒零击检测方法,基于观察到人类编写的文本通常包含更多的语法错误这一事实。这种方法计算给定文本的语法错误纠正得分(GECScore),以区分人类编写的文本和LLM生成的文本。大量实验结果表明,我们的方法超越了当前最先进的(SOTA)零击和监督方法,实现平均AUROC为98.7%,并表现出对同义词和对抗干扰攻击具有较强的鲁棒性。
URL
https://arxiv.org/abs/2405.04286