Paper Reading AI Learner

Texture-aware and Shape-guided Transformer for Sequential DeepFake Detection

2024-04-22 04:47:52
Yunfei Li, Jiaran Zhou, Xin Wang, Junyu Dong, Yuezun Li

Abstract

Sequential DeepFake detection is an emerging task that aims to predict the manipulation sequence in order. Existing methods typically formulate it as an image-to-sequence problem, employing conventional Transformer architectures for detection. However, these methods lack dedicated design and consequently result in limited performance. In this paper, we propose a novel Texture-aware and Shape-guided Transformer to enhance detection performance. Our method features four major improvements. Firstly, we describe a texture-aware branch that effectively captures subtle manipulation traces with the Diversiform Pixel Difference Attention module. Then we introduce a Bidirectional Interaction Cross-attention module that seeks deep correlations among spatial and sequential features, enabling effective modeling of complex manipulation traces. To further enhance the cross-attention, we describe a Shape-guided Gaussian mapping strategy, providing initial priors of the manipulation shape. Finally, observing that the latter manipulation in a sequence may influence traces left in the earlier one, we intriguingly invert the prediction order from forward to backward, leading to notable gains as expected. Extensive experimental results demonstrate that our method outperforms others by a large margin, highlighting the superiority of our method.

Abstract (translated)

序列深度伪造检测是一个新兴的任务,旨在预测操作序列。现有的方法通常将其表示为图像到序列问题,并采用传统的Transformer架构进行检测。然而,这些方法缺乏专门的设计,因此其性能有限。在本文中,我们提出了一种新颖的Texture-aware和Shape-guided Transformer,以提高检测性能。我们的方法具有四个主要改进。首先,我们描述了一个Texture-aware分支,通过Diversiform Pixel Difference Attention模块有效地捕捉到细微的操纵痕迹。然后我们引入了双向交互跨注意力模块,寻求空间和序列特征之间的深入关系,从而有效建模复杂的操纵痕迹。为了进一步提高跨注意,我们描述了Shape-guided Gaussian映射策略,为操纵形状提供初始概率。最后,观察到序列中的后一个操纵可能影响前面一个操纵留下的痕迹,我们有趣地颠倒预测顺序,从而带来预期的显著提升。大量的实验结果表明,与其他方法相比,我们的方法优势明显,突出了我们方法的优越性。

URL

https://arxiv.org/abs/2404.13873

PDF

https://arxiv.org/pdf/2404.13873.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot