Abstract
The rapid development of Deepfake technology has enabled the generation of highly realistic manipulated videos, posing severe social and ethical challenges. Existing Deepfake detection methods primarily focused on either spatial or temporal inconsistencies, often neglecting the interplay between the two or suffering from interference caused by natural facial motions. To address these challenges, we propose the global context consistency flow (GC-ConsFlow), a novel dual-stream framework that effectively integrates spatial and temporal features for robust Deepfake detection. The global grouped context aggregation module (GGCA), integrated into the global context-aware frame flow stream (GCAF), enhances spatial feature extraction by aggregating grouped global context information, enabling the detection of subtle, spatial artifacts within frames. The flow-gradient temporal consistency stream (FGTC), rather than directly modeling the residuals, it is used to improve the robustness of temporal feature extraction against the inconsistency introduced by unnatural facial motion using optical flow residuals and gradient-based features. By combining these two streams, GC-ConsFlow demonstrates the effectiveness and robustness in capturing complementary spatiotemporal forgery traces. Extensive experiments show that GC-ConsFlow outperforms existing state-of-the-art methods in detecting Deepfake videos under various compression scenarios.
Abstract (translated)
深度伪造技术的快速发展使得生成高度逼真的篡改视频成为可能,这引发了严重的社会和伦理挑战。现有的深度伪造检测方法主要集中在空间或时间上的不一致性上,往往忽视了这两者之间的相互作用,或者因自然面部运动而遭受干扰。为了解决这些挑战,我们提出了全局上下文一致流(GC-ConsFlow),这是一种新颖的双流框架,能够有效地整合空间和时间特征以实现稳健的深度伪造检测。 该方法包含两个主要模块:一是集成在全局上下文感知帧流(GCAF)中的全局分组上下文聚合模块(GGCA),通过汇集分组后的全球上下文信息来增强空间特征提取能力,从而能够识别出图像中细微的空间异常。二是使用光流残差和基于梯度的特性来提高时间特征提取对不自然面部运动引入的时间不一致性抵御力的流量-梯度时间一致性流(FGTC)。 通过结合这两个模块,GC-ConsFlow展示了捕捉互补时空伪造痕迹的有效性和稳健性。广泛的实验表明,在各种压缩场景下,GC-ConsFlow在检测深度伪造视频方面优于现有的最先进的方法。
URL
https://arxiv.org/abs/2501.13435