Spoken Term Detection Methods for Sparse Transcription in Very Low-resource Settings

2021-06-11 04:09:54

Éric Le Ferrand, Steven Bird, Laurent Besacier

arXiv_CL

Abstract
Abstract (translated)
URL
PDF

Abstract

We investigate the efficiency of two very different spoken term detection approaches for transcription when the available data is insufficient to train a robust ASR system. This work is grounded in very low-resource language documentation scenario where only few minutes of recording have been transcribed for a given language so far.Experiments on two oral languages show that a pretrained universal phone recognizer, fine-tuned with only a few minutes of target language speech, can be used for spoken term detection with a better overall performance than a dynamic time warping approach. In addition, we show that representing phoneme recognition ambiguity in a graph structure can further boost the recall while maintaining high precision in the low resource spoken term detection task.

Abstract (translated)

URL

https://arxiv.org/abs/2106.06160

PDF

https://arxiv.org/pdf/2106.06160.pdf