Paper Reading AI Learner

Learning Expressive And Generalizable Motion Features For Face Forgery Detection

2024-03-08 09:25:48
Jingyi Zhang, Peng Zhang, Jingjing Wang, Di Xie, Shiliang Pu

Abstract

Previous face forgery detection methods mainly focus on appearance features, which may be easily attacked by sophisticated manipulation. Considering the majority of current face manipulation methods generate fake faces based on a single frame, which do not take frame consistency and coordination into consideration, artifacts on frame sequences are more effective for face forgery detection. However, current sequence-based face forgery detection methods use general video classification networks directly, which discard the special and discriminative motion information for face manipulation detection. To this end, we propose an effective sequence-based forgery detection framework based on an existing video classification method. To make the motion features more expressive for manipulation detection, we propose an alternative motion consistency block instead of the original motion features module. To make the learned features more generalizable, we propose an auxiliary anomaly detection block. With these two specially designed improvements, we make a general video classification network achieve promising results on three popular face forgery datasets.

Abstract (translated)

过去的面部伪造检测方法主要关注外观特征,这可能很容易被高级操纵所攻击。考虑到当前大多数面部 manipulation 方法都是基于单个帧生成的假脸,没有考虑帧的一致性和协调性,序列中的伪影对于面部伪造检测来说更为有效。然而,现有的序列基于面部伪造检测的方法直接使用通用视频分类网络,这忽略了面部操纵检测的特殊和鉴别信息。为此,我们提出了一个基于现有视频分类方法的序列基于伪造检测框架。为了使操纵检测更具有表现力,我们提出了一个替代的动态一致性模块,而不是原始动态特征模块。为了使学习到的特征更具通用性,我们提出了一个辅助异常检测模块。通过这两个特别设计的改进,我们使一般视频分类网络在三个流行的面部伪造数据集上取得了良好的结果。

URL

https://arxiv.org/abs/2403.05172

PDF

https://arxiv.org/pdf/2403.05172.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot