Paper Reading AI Learner

DetectGPT: Zero-Shot Machine-Generated Text Detection using Probability Curvature

2023-01-26 18:44:06
Eric Mitchell, Yoonho Lee, Alexander Khazatsky, Christopher D. Manning, Chelsea Finn

Abstract

The fluency and factual knowledge of large language models (LLMs) heightens the need for corresponding systems to detect whether a piece of text is machine-written. For example, students may use LLMs to complete written assignments, leaving instructors unable to accurately assess student learning. In this paper, we first demonstrate that text sampled from an LLM tends to occupy negative curvature regions of the model's log probability function. Leveraging this observation, we then define a new curvature-based criterion for judging if a passage is generated from a given LLM. This approach, which we call DetectGPT, does not require training a separate classifier, collecting a dataset of real or generated passages, or explicitly watermarking generated text. It uses only log probabilities computed by the model of interest and random perturbations of the passage from another generic pre-trained language model (e.g, T5). We find DetectGPT is more discriminative than existing zero-shot methods for model sample detection, notably improving detection of fake news articles generated by 20B parameter GPT-NeoX from 0.81 AUROC for the strongest zero-shot baseline to 0.95 AUROC for DetectGPT. See this https URL for code, data, and other project information.

Abstract (translated)

大型语言模型(LLM)的流畅和事实知识增加了检测文本是否人为编写的系统需求。例如,学生可以使用LLM来完成书面作业,从而使教练无法准确评估学生学习。在本文中,我们首先证明了从LLM中采样的文字倾向于占据模型的log概率函数的负曲率区域。利用这一观察,我们然后定义了一种新的曲率基于标准差的标准来判断一段文本是否来自给定LLM。我们称之为DetectGPT,它不需要训练一个单独的分类器、收集真实或生成的文本数据集,或明确嵌入生成的文本。它仅使用由感兴趣的模型计算的log概率和随机扰动生成的文本的样本数据集(例如T5)。我们发现DetectGPT在模型样本检测方面比现有的零样本方法更为有效,特别是改进了由20B参数GPT-NeoX生成的假新闻文章的检测效果,从0.81 AUROC的最强的零样本基线到DetectGPT的0.95 AUROC。见这个httpsURL以代码、数据和其他项目信息。

URL

https://arxiv.org/abs/2301.11305

PDF

https://arxiv.org/pdf/2301.11305.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot