Paper Reading AI Learner

ChatGPT or Human? Detect and Explain. Explaining Decisions of Machine Learning Model for Detecting Short ChatGPT-generated Text

2023-01-30 08:06:08
Sandra Mitrović, Davide Andreoletti, Omran Ayoub

Abstract

ChatGPT has the ability to generate grammatically flawless and seemingly-human replies to different types of questions from various domains. The number of its users and of its applications is growing at an unprecedented rate. Unfortunately, use and abuse come hand in hand. In this paper, we study whether a machine learning model can be effectively trained to accurately distinguish between original human and seemingly human (that is, ChatGPT-generated) text, especially when this text is short. Furthermore, we employ an explainable artificial intelligence framework to gain insight into the reasoning behind the model trained to differentiate between ChatGPT-generated and human-generated text. The goal is to analyze model's decisions and determine if any specific patterns or characteristics can be identified. Our study focuses on short online reviews, conducting two experiments comparing human-generated and ChatGPT-generated text. The first experiment involves ChatGPT text generated from custom queries, while the second experiment involves text generated by rephrasing original human-generated reviews. We fine-tune a Transformer-based model and use it to make predictions, which are then explained using SHAP. We compare our model with a perplexity score-based approach and find that disambiguation between human and ChatGPT-generated reviews is more challenging for the ML model when using rephrased text. However, our proposed approach still achieves an accuracy of 79%. Using explainability, we observe that ChatGPT's writing is polite, without specific details, using fancy and atypical vocabulary, impersonal, and typically it does not express feelings.

Abstract (translated)

ChatGPT 能够生成语法完美、看似人类的回复,来自不同领域各种类型的问题。其用户和应用程序数量正在以前所未有的速度增长。不幸的是,使用和滥用随之而来的。在本文中,我们研究是否存在一种机器学习模型能够有效地训练,准确地区分原始人类和看似人类的(即 ChatGPT 生成的)文本,特别是在文本较短的情况下。我们还使用可解释的人工智能框架来深入了解训练模型以区分 ChatGPT 生成的和人类生成的文本背后的推理。目标是分析模型的决策,并确定是否存在任何特定的模式或特征可以识别。我们的研究关注简短的在线评论,进行两次实验,比较人类生成的和 ChatGPT 生成的文本。第一次实验涉及从自定义查询生成的 ChatGPT 文本,第二次实验涉及用人类生成的评论重新表述的文本。我们优化了基于Transformer 的模型,并使用它进行预测,然后用 SHAP 解释。我们比较了我们模型与基于混淆度分数的方法,并发现,使用重新表述的文本对机器学习模型来说,区分人类和 ChatGPT 生成的评论更具挑战性。然而,我们提出的方法仍实现了 79% 的准确率。利用解释性,我们观察到 ChatGPT 的写作风格礼貌,缺乏具体细节,使用华丽和罕见的词汇,冷漠,通常它不表达情感。

URL

https://arxiv.org/abs/2301.13852

PDF

https://arxiv.org/pdf/2301.13852.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot