Abstract
In the face of a new era of generative models, the detection of artificially generated content has become a matter of utmost importance. The ability to create credible minute-long music deepfakes in a few seconds on user-friendly platforms poses a real threat of fraud on streaming services and unfair competition to human artists. This paper demonstrates the possibility (and surprising ease) of training classifiers on datasets comprising real audio and fake reconstructions, achieving a convincing accuracy of 99.8%. To our knowledge, this marks the first publication of a music deepfake detector, a tool that will help in the regulation of music forgery. Nevertheless, informed by decades of literature on forgery detection in other fields, we stress that a good test score is not the end of the story. We step back from the straightforward ML framework and expose many facets that could be problematic with such a deployed detector: calibration, robustness to audio manipulation, generalisation to unseen models, interpretability and possibility for recourse. This second part acts as a position for future research steps in the field and a caveat to a flourishing market of fake content checkers.
Abstract (translated)
在面对全新一代生成模型的新时代,检测人造内容已成为至关重要的事。在用户友好平台上几秒钟内创建可信的分钟长度的AI音乐 deepfake,对流媒体服务的欺诈威胁和对人类艺术家的不公平竞争构成了真正的威胁。本文证明了在包含真实音频和假重建的数据集上训练分类器是可能的,并且令人惊讶地容易,达到了99.8%的准确度。据我们所知,这标志着音乐 deepfake 检测器的首次发布,这将有助于音乐欺诈的监管。然而,根据其他领域的伪造检测几十年的文献,我们强调一个好的测试分数并不是故事的结束。我们离开了简单的机器学习框架,揭示了可能存在问题的部署检测器的许多方面:校准,对音频操作的鲁棒性,对未见过的模型的泛化,可解释性和可诉性。第二部分在领域未来的研究步骤中扮演了立场,同时也是繁荣内容检查器市场的警示。
URL
https://arxiv.org/abs/2405.04181