FUN! Fast, Universal, Non-Semantic Speech Embeddings

2020-11-09 18:07:06

Jacob Peplinski, Joel Shor, Sachin Joglekar, Jake Garrison, Shwetak Patel

arXiv_SD

arXiv_SD Detection Embedding Knowledge Pose Speech

Abstract
Abstract (translated)
URL
PDF

Abstract

Learned speech representations can drastically improve performance on tasks with limited labeled data. However, due to their size and complexity, learned representations have limited utility in mobile settings where run-time performance is a significant bottleneck. We propose a class of lightweight universal speech embedding models based on MobileNet that are designed to run efficiently on mobile devices. These embeddings, which encapsulate speech non-semantics and thus can be re-used for several tasks, are trained via knowledge distillation. We show that these embedding models are fast enough to run in real-time on a variety of mobile devices and exhibit negligible performance degradation on most tasks in a recently published benchmark of non-semantic speech tasks. Furthermore, we demonstrate that these representations are useful for mobile health tasks such as mask detection during speech and non-speech human sounds detection.

Abstract (translated)

URL

https://arxiv.org/abs/2011.04609

PDF

https://arxiv.org/pdf/2011.04609.pdf