Few-Shot Speaker Identification Using Depthwise Separable Convolutional Network with Channel Attention

2022-04-24 03:31:05

Yanxiong Li, Wucheng Wang, Hao Chen, Wenchang Cao, Wei Li, Qianhua He

arXiv_SD

arXiv_SD CNN Classification Attention Pose Few-Shot Speech

Abstract
Abstract (translated)
URL
PDF

Abstract

Although few-shot learning has attracted much attention from the fields of image and audio classification, few efforts have been made on few-shot speaker identification. In the task of few-shot learning, overfitting is a tough problem mainly due to the mismatch between training and testing conditions. In this paper, we propose a few-shot speaker identification method which can alleviate the overfitting problem. In the proposed method, the model of a depthwise separable convolutional network with channel attention is trained with a prototypical loss function. Experimental datasets are extracted from three public speech corpora: Aishell-2, VoxCeleb1 and TORGO. Experimental results show that the proposed method exceeds state-of-the-art methods for few-shot speaker identification in terms of accuracy and F-score.

Abstract (translated)

URL

https://arxiv.org/abs/2204.11180

PDF

https://arxiv.org/pdf/2204.11180.pdf