Rapid Connectionist Speaker Adaptation

2022-11-15 00:15:11

Michael Witbrock, Patrick Haffner

arXiv_AI

arXiv_AI Recognition Speech

Abstract
Abstract (translated)
URL
PDF

Abstract

We present SVCnet, a system for modelling speaker variability. Encoder Neural Networks specialized for each speech sound produce low dimensionality models of acoustical variation, and these models are further combined into an overall model of voice variability. A training procedure is described which minimizes the dependence of this model on which sounds have been uttered. Using the trained model (SVCnet) and a brief, unconstrained sample of a new speaker's voice, the system produces a Speaker Voice Code that can be used to adapt a recognition system to the new speaker without retraining. A system which combines SVCnet with an MS-TDNN recognizer is described

Abstract (translated)

URL

https://arxiv.org/abs/2211.08978

PDF

https://arxiv.org/pdf/2211.08978.pdf