Paper Reading AI Learner

Unmasking Illusions: Understanding Human Perception of Audiovisual Deepfakes

2024-05-07 07:57:15
Ammarah Hashmi, Sahibzada Adil Shahzad, Chia-Wen Lin, Yu Tsao, Hsin-Min Wang

Abstract

The emergence of contemporary deepfakes has attracted significant attention in machine learning research, as artificial intelligence (AI) generated synthetic media increases the incidence of misinterpretation and is difficult to distinguish from genuine content. Currently, machine learning techniques have been extensively studied for automatically detecting deepfakes. However, human perception has been less explored. Malicious deepfakes could ultimately cause public and social problems. Can we humans correctly perceive the authenticity of the content of the videos we watch? The answer is obviously uncertain; therefore, this paper aims to evaluate the human ability to discern deepfake videos through a subjective study. We present our findings by comparing human observers to five state-ofthe-art audiovisual deepfake detection models. To this end, we used gamification concepts to provide 110 participants (55 native English speakers and 55 non-native English speakers) with a webbased platform where they could access a series of 40 videos (20 real and 20 fake) to determine their authenticity. Each participant performed the experiment twice with the same 40 videos in different random orders. The videos are manually selected from the FakeAVCeleb dataset. We found that all AI models performed better than humans when evaluated on the same 40 videos. The study also reveals that while deception is not impossible, humans tend to overestimate their detection capabilities. Our experimental results may help benchmark human versus machine performance, advance forensics analysis, and enable adaptive countermeasures.

Abstract (translated)

当代深度伪造技术的出现引起了机器学习研究中的广泛关注,因为人工智能生成的合成媒体增加了误解的发生,且很难与真实内容区分开来。目前,机器学习技术已经广泛研究用于自动检测深度伪造。然而,人类感知却受到了较少关注。恶意深度伪造可能会最终导致公共和社交问题。我们人类能否正确理解我们观看的视频内容的真实性?答案显然是不确定的;因此,本文旨在通过主观研究评估人类辨别深度伪造视频的能力。我们通过将人类观察者与五个最先进的音频视觉深度伪造检测模型进行比较,得出我们的研究结果。为此,我们使用游戏化概念为55名参与者(55名母语为英语的参与者和55名非英语参与者)提供了一个基于网络的平台,让他们可以访问一系列40个视频(20个真实和20个伪造)来确定其真实性。每位参与者两次使用相同的40个视频进行实验,随机排列。视频是从FakeAVCeleb数据集中手动选择的。我们发现,所有人工智能模型在相同40个视频上评估时都表现得比人类更好。研究还揭示了,尽管欺骗并不罕见,但人类往往高估了他们的检测能力。我们的实验结果可能有助于衡量人类与机器的表现,促进法医分析,并实现自适应对策。

URL

https://arxiv.org/abs/2405.04097

PDF

https://arxiv.org/pdf/2405.04097.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot