DUPE: Detection Undermining via Prompt Engineering for Deepfake Text

Abstract
Abstract (translated)
URL
PDF

Abstract

As large language models (LLMs) become increasingly commonplace, concern about distinguishing between human and AI text increases as well. The growing power of these models is of particular concern to teachers, who may worry that students will use LLMs to write school assignments. Facing a technology with which they are unfamiliar, teachers may turn to publicly-available AI text detectors. Yet the accuracy of many of these detectors has not been thoroughly verified, posing potential harm to students who are falsely accused of academic dishonesty. In this paper, we evaluate three different AI text detectors-Kirchenbauer et al. watermarks, ZeroGPT, and GPTZero-against human and AI-generated essays. We find that watermarking results in a high false positive rate, and that ZeroGPT has both high false positive and false negative rates. Further, we are able to significantly increase the false negative rate of all detectors by using ChatGPT 3.5 to paraphrase the original AI-generated texts, thereby effectively bypassing the detectors.

Abstract (translated)

随着大型语言模型（LLMs）变得越来越普遍，对区分人类和人工智能文本的担忧也在增加。这些模型的日益增长的力量使教师更加关注学生是否会使用LLMs来完成学校作业。对于那些不熟悉这些技术的人来说，教师可能会寻求公开可用的人工智能文本检测器。然而，许多这些检测器的准确性尚未得到充分验证，这可能会对被错误指控为学术不端的学生造成潜在的伤害。在本文中，我们评估了三个人工智能文本检测器-Kirchenbauer等人. watermarks，ZeroGPT和GPTZero-以及人类和人工智能生成的论文。我们发现，水印导致假阳性率很高，而ZeroGPT具有高假阳性和假阴性率。此外，通过使用ChatGPT 3.5对原始人工智能生成的文本进行转述，我们可以显著增加所有检测器的假阴性率，从而有效地绕过检测器。

URL

https://arxiv.org/abs/2404.11408

PDF

https://arxiv.org/pdf/2404.11408.pdf

DUPE: Detection Undermining via Prompt Engineering for Deepfake Text

Abstract

Abstract (translated)

URL

PDF Copy

PDF