Abstract
Sequential DeepFake detection is an emerging task that aims to predict the manipulation sequence in order. Existing methods typically formulate it as an image-to-sequence problem, employing conventional Transformer architectures for detection. However, these methods lack dedicated design and consequently result in limited performance. In this paper, we propose a novel Texture-aware and Shape-guided Transformer to enhance detection performance. Our method features four major improvements. Firstly, we describe a texture-aware branch that effectively captures subtle manipulation traces with the Diversiform Pixel Difference Attention module. Then we introduce a Bidirectional Interaction Cross-attention module that seeks deep correlations among spatial and sequential features, enabling effective modeling of complex manipulation traces. To further enhance the cross-attention, we describe a Shape-guided Gaussian mapping strategy, providing initial priors of the manipulation shape. Finally, observing that the latter manipulation in a sequence may influence traces left in the earlier one, we intriguingly invert the prediction order from forward to backward, leading to notable gains as expected. Extensive experimental results demonstrate that our method outperforms others by a large margin, highlighting the superiority of our method.
Abstract (translated)
序列深度伪造检测是一个新兴的任务,旨在预测操作序列。现有的方法通常将其表示为图像到序列问题,并采用传统的Transformer架构进行检测。然而,这些方法缺乏专门的设计,因此其性能有限。在本文中,我们提出了一种新颖的Texture-aware和Shape-guided Transformer,以提高检测性能。我们的方法具有四个主要改进。首先,我们描述了一个Texture-aware分支,通过Diversiform Pixel Difference Attention模块有效地捕捉到细微的操纵痕迹。然后我们引入了双向交互跨注意力模块,寻求空间和序列特征之间的深入关系,从而有效建模复杂的操纵痕迹。为了进一步提高跨注意,我们描述了Shape-guided Gaussian映射策略,为操纵形状提供初始概率。最后,观察到序列中的后一个操纵可能影响前面一个操纵留下的痕迹,我们有趣地颠倒预测顺序,从而带来预期的显著提升。大量的实验结果表明,与其他方法相比,我们的方法优势明显,突出了我们方法的优越性。
URL
https://arxiv.org/abs/2404.13873