Real-time low-resource phoneme recognition on edge devices

Abstract
Abstract (translated)
URL
PDF

Abstract

While speech recognition has seen a surge in interest and research over the last decade, most machine learning models for speech recognition either require large training datasets or lots of storage and memory. Combined with the prominence of English as the number one language in which audio data is available, this means most other languages currently lack good speech recognition models. The method presented in this paper shows how to create and train models for speech recognition in any language which are not only highly accurate, but also require very little storage, memory and training data when compared with traditional models. This allows training models to recognize any language and deploying them on edge devices such as mobile phones or car displays for fast real-time speech recognition.

Abstract (translated)

URL

https://arxiv.org/abs/2103.13997

PDF

https://arxiv.org/pdf/2103.13997.pdf