SceneFake: An Initial Dataset and Benchmarks for Scene Fake Audio Detection

2022-11-11 09:05:50

Jiangyan Yi, Chenglong Wang, Jianhua Tao, Zhengkun Tian, Cunhang Fan, Haoxin Ma, Ruibo Fu

arXiv_CL

Abstract
Abstract (translated)
URL
PDF

Abstract

Previous databases have been designed to further the development of fake audio detection. However, fake utterances are mostly generated by altering timbre, prosody, linguistic content or channel noise of original audios. They ignore a fake situation, in which the attacker manipulates an acoustic scene of the original audio with another forgery one. It will pose a major threat to our society if some people misuse the manipulated audio with malicious purpose. Therefore, this motivates us to fill in the gap. This paper designs such a dataset for scene fake audio detection (SceneFake). A manipulated audio in the SceneFake dataset involves only tampering the acoustic scene of an utterance by using speech enhancement technologies. We can not only detect fake utterances on a seen test set but also evaluate the generalization of fake detection models to unseen manipulation attacks. Some benchmark results are described on the SceneFake dataset. Besides, an analysis of fake attacks with different speech enhancement technologies and signal-to-noise ratios are presented on the dataset. The results show that scene manipulated utterances can not be detected reliably by the existing baseline models of ASVspoof 2019. Furthermore, the detection of unseen scene manipulation audio is still challenging.

Abstract (translated)

URL

https://arxiv.org/abs/2211.06073

PDF

https://arxiv.org/pdf/2211.06073.pdf