Paper Reading AI Learner

Backend Ensemble for Speaker Verification and Spoofing Countermeasure

2022-07-05 04:10:45
Li Zhang, Yue Li, Huan Zhao, Qing Wang, Lei Xie

Abstract

This paper describes the NPU system submitted to Spoofing Aware Speaker Verification Challenge 2022. We particularly focus on the \textit{backend ensemble} for speaker verification and spoofing countermeasure from three aspects. Firstly, besides simple concatenation, we propose circulant matrix transformation and stacking for speaker embeddings and countermeasure embeddings. With the stacking operation of newly-defined circulant embeddings, we almost explore all the possible interactions between speaker embeddings and countermeasure embeddings. Secondly, we attempt different convolution neural networks to selectively fuse the embeddings' salient regions into channels with convolution kernels. Finally, we design parallel attention in 1D convolution neural networks to learn the global correlation in channel dimensions as well as to learn the important parts in feature dimensions. Meanwhile, we embed squeeze-and-excitation attention in 2D convolutional neural networks to learn the global dependence among speaker embeddings and countermeasure embeddings. Experimental results demonstrate that all the above methods are effective. After fusion of four well-trained models enhanced by the mentioned methods, the best SASV-EER, SPF-EER and SV-EER we achieve are 0.559\%, 0.354\% and 0.857\% on the evaluation set respectively. Together with the above contributions, our submission system achieves the fifth place in this challenge.

Abstract (translated)

URL

https://arxiv.org/abs/2207.01802

PDF

https://arxiv.org/pdf/2207.01802.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot