Tackling the Score Shift in Cross-Lingual Speaker Verification by Exploiting Language Information

2021-10-18 10:12:51

Jenthe Thienpondt, Brecht Desplanques, Kris Demuynck

arXiv_SD

Abstract
Abstract (translated)
URL
PDF

Abstract

This paper contains a post-challenge performance analysis on cross-lingual speaker verification of the IDLab submission to the VoxCeleb Speaker Recognition Challenge 2021 (VoxSRC-21). We show that current speaker embedding extractors consistently underestimate speaker similarity in within-speaker cross-lingual trials. Consequently, the typical training and scoring protocols do not put enough emphasis on the compensation of intra-speaker language variability. We propose two techniques to increase cross-lingual speaker verification robustness. First, we enhance our previously proposed Large-Margin Fine-Tuning (LM-FT) training stage with a mini-batch sampling strategy which increases the amount of intra-speaker cross-lingual samples within the mini-batch. Second, we incorporate language information in the logistic regression calibration stage. We integrate quality metrics based on soft and hard decisions of a VoxLingua107 language identification model. The proposed techniques result in a 11.7% relative improvement over the baseline model on the VoxSRC-21 test set and contributed to our third place finish in the corresponding challenge.

Abstract (translated)

URL

https://arxiv.org/abs/2110.09150

PDF

https://arxiv.org/pdf/2110.09150.pdf