Abstract
The emergence of contemporary deepfakes has attracted significant attention in machine learning research, as artificial intelligence (AI) generated synthetic media increases the incidence of misinterpretation and is difficult to distinguish from genuine content. Currently, machine learning techniques have been extensively studied for automatically detecting deepfakes. However, human perception has been less explored. Malicious deepfakes could ultimately cause public and social problems. Can we humans correctly perceive the authenticity of the content of the videos we watch? The answer is obviously uncertain; therefore, this paper aims to evaluate the human ability to discern deepfake videos through a subjective study. We present our findings by comparing human observers to five state-ofthe-art audiovisual deepfake detection models. To this end, we used gamification concepts to provide 110 participants (55 native English speakers and 55 non-native English speakers) with a webbased platform where they could access a series of 40 videos (20 real and 20 fake) to determine their authenticity. Each participant performed the experiment twice with the same 40 videos in different random orders. The videos are manually selected from the FakeAVCeleb dataset. We found that all AI models performed better than humans when evaluated on the same 40 videos. The study also reveals that while deception is not impossible, humans tend to overestimate their detection capabilities. Our experimental results may help benchmark human versus machine performance, advance forensics analysis, and enable adaptive countermeasures.
Abstract (translated)
当代深度伪造技术的出现引起了机器学习研究中的广泛关注,因为人工智能生成的合成媒体增加了误解的发生,且很难与真实内容区分开来。目前,机器学习技术已经广泛研究用于自动检测深度伪造。然而,人类感知却受到了较少关注。恶意深度伪造可能会最终导致公共和社交问题。我们人类能否正确理解我们观看的视频内容的真实性?答案显然是不确定的;因此,本文旨在通过主观研究评估人类辨别深度伪造视频的能力。我们通过将人类观察者与五个最先进的音频视觉深度伪造检测模型进行比较,得出我们的研究结果。为此,我们使用游戏化概念为55名参与者(55名母语为英语的参与者和55名非英语参与者)提供了一个基于网络的平台,让他们可以访问一系列40个视频(20个真实和20个伪造)来确定其真实性。每位参与者两次使用相同的40个视频进行实验,随机排列。视频是从FakeAVCeleb数据集中手动选择的。我们发现,所有人工智能模型在相同40个视频上评估时都表现得比人类更好。研究还揭示了,尽管欺骗并不罕见,但人类往往高估了他们的检测能力。我们的实验结果可能有助于衡量人类与机器的表现,促进法医分析,并实现自适应对策。
URL
https://arxiv.org/abs/2405.04097