Abstract
The generative model has made significant advancements in the creation of realistic videos, which causes security issues. However, this emerging risk has not been adequately addressed due to the absence of a benchmark dataset for AI-generated videos. In this paper, we first construct a video dataset using advanced diffusion-based video generation algorithms with various semantic contents. Besides, typical video lossy operations over network transmission are adopted to generate degraded samples. Then, by analyzing local and global temporal defects of current AI-generated videos, a novel detection framework by adaptively learning local motion information and global appearance variation is constructed to expose fake videos. Finally, experiments are conducted to evaluate the generalization and robustness of different spatial and temporal domain detection methods, where the results can serve as the baseline and demonstrate the research challenge for future studies.
Abstract (translated)
生成模型在创建逼真的视频方面取得了显著进展,这导致了安全问题。然而,由于缺乏用于人工智能生成的视频的基准数据集,这个问题尚未得到充分解决。在本文中,我们首先使用先进的扩散基视频生成算法构建了一个具有各种语义内容的视频数据集。此外,网络传输中的典型视频损失操作被采用以生成降低样本。然后,通过分析当前人工智能生成的视频的局部和全局时间缺陷,采用自适应学习局部运动信息和全局外观变化来构建一种新检测框架,以揭露伪视频。最后,我们进行了实验来评估不同空间和时间域检测方法的泛化能力和鲁棒性,其中结果可以作为基线,并展示未来研究的挑战。
URL
https://arxiv.org/abs/2405.04133