Abstract
The availability of smart devices leads to an exponential increase in multimedia content. However, the rapid advancements in deep learning have given rise to sophisticated algorithms capable of manipulating or creating multimedia fake content, known as Deepfake. Audio Deepfakes pose a significant threat by producing highly realistic voices, thus facilitating the spread of misinformation. To address this issue, numerous audio anti-spoofing detection challenges have been organized to foster the development of anti-spoofing countermeasures. This survey paper presents a comprehensive review of every component within the detection pipeline, including algorithm architectures, optimization techniques, application generalizability, evaluation metrics, performance comparisons, available datasets, and open-source availability. For each aspect, we conduct a systematic evaluation of the recent advancements, along with discussions on existing challenges. Additionally, we also explore emerging research topics on audio anti-spoofing, including partial spoofing detection, cross-dataset evaluation, and adversarial attack defence, while proposing some promising research directions for future work. This survey paper not only identifies the current state-of-the-art to establish strong baselines for future experiments but also guides future researchers on a clear path for understanding and enhancing the audio anti-spoofing detection mechanisms.
Abstract (translated)
智能设备的普及导致多媒体内容的指数级增加。然而,深度学习的快速发展已经催生出能够操纵或创建多媒体假内容的复杂算法,即Deepfake。音频Deepfakes通过产生高度逼真的声音,从而促进信息的传播,对人类社会造成了严重的威胁。为了解决这个问题,已经组织了大量的音频抗伪造检测挑战,以促进对抗伪造技术的研发。 这份调查论文对检测管道的每个组成部分进行了全面的回顾,包括算法架构、优化技术、应用的可扩展性、评估指标、性能比较和可用数据集以及开源性。对每个方面,我们进行了对最近进展的系统评估,并讨论了现有的挑战。此外,我们还探讨了音频抗伪造的研究方向,包括部分伪造检测、跨数据集评估和防御性攻击,同时为未来的研究提出了有前途的研究方向。 这份调查论文不仅确定了当前的最先进水平,为未来的实验建立了强大的基线,而且还指导了未来研究人员理解并提高音频抗伪造检测机制的清晰路径。
URL
https://arxiv.org/abs/2404.13914