The HUAWEI Speaker Diarisation System for the VoxCeleb Speaker Diarisation Challenge

Abstract
Abstract (translated)
URL
PDF

Abstract

This paper describes the development of our system for the VoxCeleb Speaker Diarisation Challenge 2020. A well trained neural network based speech enhancement model is used for pre-processing and a neural network based voice activity detection (VAD) system is followed to remove background music and noise which are harmful for speaker diarisation system. The following diarisation system is built based on agglomerative hierarchical clustering (AHC) of x-vectors and a variational Bayesian hidden Markov Model (VB-HMM) based iterative clustering. Experimental results demonstrate that the proposed system yields substantial improvements compared with the baseline method for the diarisation task of the VoxCeleb Speaker Recognition Challenge 2020.

Abstract (translated)

URL

https://arxiv.org/abs/2010.11657

PDF

https://arxiv.org/pdf/2010.11657.pdf