Abstract
Recent advances in deep generative models have led to significant progress in video generation, yet the fidelity of AI-generated videos remains limited. Synthesized content often exhibits visual artifacts such as temporally inconsistent motion, physically implausible trajectories, unnatural object deformations, and local blurring that undermine realism and user trust. Accurate detection and spatial localization of these artifacts are crucial for both automated quality control and for guiding the development of improved generative models. However, the research community currently lacks a comprehensive benchmark specifically designed for artifact localization in AI generated videos. Existing datasets either restrict themselves to video or frame level detection or lack the fine-grained spatial annotations necessary for evaluating localization methods. To address this gap, we introduce BrokenVideos, a benchmark dataset of 3,254 AI-generated videos with meticulously annotated, pixel-level masks highlighting regions of visual corruption. Each annotation is validated through detailed human inspection to ensure high quality ground truth. Our experiments show that training state of the art artifact detection models and multi modal large language models (MLLMs) on BrokenVideos significantly improves their ability to localize corrupted regions. Through extensive evaluation, we demonstrate that BrokenVideos establishes a critical foundation for benchmarking and advancing research on artifact localization in generative video models. The dataset is available at: this https URL.
Abstract (translated)
最近在深度生成模型领域的进展显著推动了视频生成技术的发展,然而人工智能生成的视频的真实性依然有限。合成内容经常出现诸如时间上不一致的动作、物理上不可能的轨迹、不自然的对象变形以及局部模糊等视觉缺陷,这些都削弱了其真实性和用户信任度。准确检测并定位这些缺陷对于自动质量控制和指导改进生成模型的发展至关重要。然而,目前研究社区缺乏一个专门用于人工智能生成视频中缺陷定位的全面基准。 现有的数据集要么仅限于视频或帧级别的检测,要么缺少评估定位方法所需的精细空间标注信息。为了填补这一空白,我们引入了BrokenVideos,这是一个包含3,254个由人工智能生成、并带有详细像素级掩码注释以突出显示视觉缺陷区域的基准数据集。每一份注释都通过详细的人员审查来确保高质量的真实度。 我们的实验表明,在BrokenVideos上训练最先进的缺陷检测模型和多模态大型语言模型(MLLMs)能够显著提升它们定位受损区域的能力。通过广泛评估,我们展示了BrokenVideos为在生成视频模型中进行缺陷定位的研究奠定了关键基础,并推动了该领域的发展。数据集可在以下网址获取:[this https URL]。
URL
https://arxiv.org/abs/2506.20103