Multi-Sample Dynamic Time Warping for Few-Shot Keyword Spotting

Abstract
Abstract (translated)
URL
PDF

Abstract

In multi-sample keyword spotting, each keyword class is represented by multiple spoken instances, called samples. A naïve approach to detect keywords in a target sequence consists of querying all samples of all classes using sub-sequence dynamic time warping. However, the resulting processing time increases linearly with respect to the number of samples belonging to each class. Alternatively, only a single Fréchet mean can be queried for each class, resulting in reduced processing time but usually also in worse detection performance as the variability of the query samples is not captured sufficiently well. In this work, multi-sample dynamic time warping is proposed to compute class-specific cost-tensors that include the variability of all query samples. To significantly reduce the computational complexity during inference, these cost tensors are converted to cost matrices before applying dynamic time warping. In experimental evaluations for few-shot keyword spotting, it is shown that this method yields a very similar performance as using all individual query samples as templates while having a runtime that is only slightly slower than when using Fréchet means.

Abstract (translated)

在多样本关键词检测中，每个关键词类别由多个口头实例表示，这些实例被称为样本。检测目标序列中关键词的一种 naive 方法包括对所有类别的样本使用子序列动态时间压缩。然而，由于每个类别的样本数量不同，因此处理时间会线性增加。另外，为每个类别只能查询一个 Fréchet mean，导致处理时间降低，但通常检测性能也会较差，因为查询样本的变异程度没有被捕捉足够好。在本文中，提出了一种多样本动态时间压缩方法来计算包括所有查询样本变异性的类特定成本张量。为了在推理过程中显著降低计算复杂性，这些成本张量在应用动态时间压缩之前被转换为成本矩阵。在少量样本关键词检测的实验评估中，研究表明，这种方法与使用所有单个查询样本作为模板时的性能非常相似，但运行时间略慢于使用 Fréchet mean。

URL

https://arxiv.org/abs/2404.14903

PDF

https://arxiv.org/pdf/2404.14903.pdf

Multi-Sample Dynamic Time Warping for Few-Shot Keyword Spotting

Abstract

Abstract (translated)

URL

PDF Copy

PDF