Paper Reading AI Learner

An Analysis of Recent Advances in Deepfake Image Detection in an Evolving Threat Landscape

2024-04-24 21:21:50
Sifat Muhammad Abdullah, Aravind Cheruvu, Shravya Kanchi, Taejoong Chung, Peng Gao, Murtuza Jadliwala, Bimal Viswanath


Deepfake or synthetic images produced using deep generative models pose serious risks to online platforms. This has triggered several research efforts to accurately detect deepfake images, achieving excellent performance on publicly available deepfake datasets. In this work, we study 8 state-of-the-art detectors and argue that they are far from being ready for deployment due to two recent developments. First, the emergence of lightweight methods to customize large generative models, can enable an attacker to create many customized generators (to create deepfakes), thereby substantially increasing the threat surface. We show that existing defenses fail to generalize well to such \emph{user-customized generative models} that are publicly available today. We discuss new machine learning approaches based on content-agnostic features, and ensemble modeling to improve generalization performance against user-customized models. Second, the emergence of \textit{vision foundation models} -- machine learning models trained on broad data that can be easily adapted to several downstream tasks -- can be misused by attackers to craft adversarial deepfakes that can evade existing defenses. We propose a simple adversarial attack that leverages existing foundation models to craft adversarial samples \textit{without adding any adversarial noise}, through careful semantic manipulation of the image content. We highlight the vulnerabilities of several defenses against our attack, and explore directions leveraging advanced foundation models and adversarial training to defend against this new threat.

Abstract (translated)

利用深度生成模型产生的 Deepfake 或合成图像对在线平台造成了严重威胁。这引发了多项研究努力,以准确检测 Deepfake 图像,并在公开的 Deepfake 数据集上取得优异性能。在这项工作中,我们研究了 8 项最先进的检测器,并认为它们距离部署还有很长的路要走,因为有两个最近的发展。首先,出现了一种轻量化的方法来定制大型生成模型,攻击者可以创建许多自定义生成器(创建 Deepfakes),从而大大增加威胁表面。我们证明了现有的防御措施对这种公开可用的用户自定义生成模型效果不佳。我们讨论了基于内容无关特征的新机器学习方法以及集成建模以提高对抗用户自定义模型的性能。其次,出现了一种可以被攻击者用于制作能够逃避现有防御措施的 adversarial Deepfakes 的机器学习模型,即 vision foundation models。我们提出了一个简单的 adversarial 攻击,该攻击利用现有的 foundation 模型制作 adversarial 样本,并通过仔细的语义操作来操纵图像内容。我们重点讨论了多项防御措施如何对抗我们的攻击,并探讨了利用先进的 foundation 模型和 adversarial 训练来抵御这种新威胁的方向。



3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot