Query-by-Example Keyword Spotting system using Multi-head Attention and Softtriple Loss

2021-02-14 03:37:37

Jinmiao Huang, Waseem Gharbieh, Han Suk Shim, Eugene Kim

arXiv_CL

Abstract
Abstract (translated)
URL
PDF

Abstract

This paper proposes a neural network architecture for tackling the query-by-example user-defined keyword spotting task. A multi-head attention module is added on top of a multi-layered GRU for effective feature extraction, and a normalized multi-head attention module is proposed for feature aggregation. We also adopt the softtriple loss - a combination of triplet loss and softmax loss - and showcase its effectiveness. We demonstrate the performance of our model on internal datasets with different languages and the public Hey-Snips dataset. We compare the performance of our model to a baseline system and conduct an ablation study to show the benefit of each component in our architecture. The proposed work shows solid performance while preserving simplicity.

Abstract (translated)

URL

https://arxiv.org/abs/2102.07061

PDF

https://arxiv.org/pdf/2102.07061.pdf