Paper Reading AI Learner

MAiDE-up: Multilingual Deception Detection of GPT-generated Hotel Reviews

2024-04-19 15:08:06
Oana Ignat, Xiaomeng Xu, Rada Mihalcea

Abstract

Deceptive reviews are becoming increasingly common, especially given the increase in performance and the prevalence of LLMs. While work to date has addressed the development of models to differentiate between truthful and deceptive human reviews, much less is known about the distinction between real reviews and AI-authored fake reviews. Moreover, most of the research so far has focused primarily on English, with very little work dedicated to other languages. In this paper, we compile and make publicly available the MAiDE-up dataset, consisting of 10,000 real and 10,000 AI-generated fake hotel reviews, balanced across ten languages. Using this dataset, we conduct extensive linguistic analyses to (1) compare the AI fake hotel reviews to real hotel reviews, and (2) identify the factors that influence the deception detection model performance. We explore the effectiveness of several models for deception detection in hotel reviews across three main dimensions: sentiment, location, and language. We find that these dimensions influence how well we can detect AI-generated fake reviews.

Abstract (translated)

欺骗性评论变得越来越普遍,尤其是在性能和LLM的普及程度增加的情况下。尽管迄今为止的工作已经解决了在真实和欺骗性人类评论之间发展模型的研究,但关于真实评论和AI编写的虚假评论之间的区别还知之甚少。此外,迄今为止,大部分研究都主要关注英语,而几乎没有关于其他语言的研究。在本文中,我们汇总并公开MAiDE-up数据集,包括10,000条真实酒店评论和10,000条由AI生成的虚假酒店评论,覆盖了十种语言。利用这个数据集,我们进行了广泛的语义分析,以(1)将AI虚假酒店评论与真实酒店评论进行比较,和(2)确定影响欺骗检测模型性能的因素。我们在三个主要维度上探索了在酒店评论中进行欺骗检测的不同模型:情感、位置和语言。我们发现,这些维度确实影响着我们在检测AI生成的虚假评论方面的效果。

URL

https://arxiv.org/abs/2404.12938

PDF

https://arxiv.org/pdf/2404.12938.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot