Abstract
Speaker recognition technology is applied in various tasks ranging from personal virtual assistants to secure access systems. However, the robustness of these systems against adversarial attacks, particularly to additive perturbations, remains a significant challenge. In this paper, we pioneer applying robustness certification techniques to speaker recognition, originally developed for the image domain. In our work, we cover this gap by transferring and improving randomized smoothing certification techniques against norm-bounded additive perturbations for classification and few-shot learning tasks to speaker recognition. We demonstrate the effectiveness of these methods on VoxCeleb 1 and 2 datasets for several models. We expect this work to improve voice-biometry robustness, establish a new certification benchmark, and accelerate research of certification methods in the audio domain.
Abstract (translated)
演讲者识别技术在各种任务中都有应用,从个人虚拟助手到安全访问系统。然而,这些系统对于对抗性攻击的鲁棒性,特别是对于添加扰动,仍然是一个重要的挑战。在本文中,我们首创将鲁棒性认证技术应用于演讲者识别,最初是为图像领域设计的。在我们的工作中,我们通过将规范有界添加扰动分类和少样本学习任务的随机平滑认证技术转移到演讲者识别上来填补这一空白。我们在多个模型上对VoxCeleb 1和2数据集进行了实验,证明了这些方法的有效性。我们预计,这项工作将提高语音生物特征的鲁棒性,建立一个新的认证基准,并加速音频领域认证方法的研究。
URL
https://arxiv.org/abs/2404.18791