The Codecfake Dataset and Countermeasures for the Universally Detection of Deepfake Audio

Abstract
Abstract (translated)
URL
PDF

Abstract

With the proliferation of Audio Language Model (ALM) based deepfake audio, there is an urgent need for effective detection methods. Unlike traditional deepfake audio generation, which often involves multi-step processes culminating in vocoder usage, ALM directly utilizes neural codec methods to decode discrete codes into audio. Moreover, driven by large-scale data, ALMs exhibit remarkable robustness and versatility, posing a significant challenge to current audio deepfake detection (ADD) models. To effectively detect ALM-based deepfake audio, we focus on the mechanism of the ALM-based audio generation method, the conversion from neural codec to waveform. We initially construct the Codecfake dataset, an open-source large-scale dataset, including two languages, millions of audio samples, and various test conditions, tailored for ALM-based audio detection. Additionally, to achieve universal detection of deepfake audio and tackle domain ascent bias issue of original SAM, we propose the CSAM strategy to learn a domain balanced and generalized minima. Experiment results demonstrate that co-training on Codecfake dataset and vocoded dataset with CSAM strategy yield the lowest average Equal Error Rate (EER) of 0.616% across all test conditions compared to baseline models.

Abstract (translated)

随着基于Audio Language Model (ALM)的深度伪造音频的普及，有效的检测方法至关重要。与传统的深度伪造音频生成，往往涉及多步过程并最终使用语音合成器，ALM直接利用神经编解码器直接将离散代码解码成音频。此外，受到大规模数据的影响，ALMs表现出非凡的稳健性和多样性，对当前的音频深度伪造检测（ADD）模型构成了重大挑战。为了有效地检测基于ALM的深度伪造音频，我们专注于ALM基于音频生成的方法，从神经编解码器到波形的转换机制。我们最初构建了Codecfake数据集，一个开源的大型数据集，包括两种语言、数百万个音频样本以及各种测试条件，专门为基于ALM的音频检测定制。此外，为了实现对深度伪造音频的普遍检测，并解决原始SAM中的域升偏见问题，我们提出了CSAM策略，以学习一个域平衡和通用的最小值。实验结果表明，在Codecfake数据集和语音合成器数据集上进行CSAM策略的协同训练，在所有测试条件下的平均等误率（EER）最低，仅为0.616%。

URL

https://arxiv.org/abs/2405.04880

PDF

https://arxiv.org/pdf/2405.04880.pdf

The Codecfake Dataset and Countermeasures for the Universally Detection of Deepfake Audio

Abstract

Abstract (translated)

URL

PDF Copy

PDF