Low-resource speech recognition and dialect identification of Irish in a multi-task framework

2024-05-02 13:54:39

Liam Lonergan, Mengjie Qian, Neasa Ní Chiaráin, Christer Gobl, Ailbhe Ní Chasaide

arXiv_AI

arXiv_AI Speech_Recognition Recognition Attention Language_Model Speech

Abstract
Abstract (translated)
URL
PDF

Abstract

This paper explores the use of Hybrid CTC/Attention encoder-decoder models trained with Intermediate CTC (InterCTC) for Irish (Gaelic) low-resource speech recognition (ASR) and dialect identification (DID). Results are compared to the current best performing models trained for ASR (TDNN-HMM) and DID (ECAPA-TDNN). An optimal InterCTC setting is initially established using a Conformer encoder. This setting is then used to train a model with an E-branchformer encoder and the performance of both architectures are compared. A multi-task fine-tuning approach is adopted for language model (LM) shallow fusion. The experiments yielded an improvement in DID accuracy of 10.8% relative to a baseline ECAPA-TDNN, and WER performance approaching the TDNN-HMM model. This multi-task approach emerges as a promising strategy for Irish low-resource ASR and DID.

Abstract (translated)

本文探讨了使用经过中间CTC（InterCTC）训练的混合编码器-解码器模型（Hybrid CTC/Attention encoder-decoder）在爱尔兰（盖尔语）低资源 speech recognition（ASR）和 dialect identification（DID）任务中的应用。结果与目前最佳训练的 ASR（TDNN-HMM）和 DID（ECAPA-TDNN）模型进行了比较。首先，通过使用 Conformer 编码器建立了一个最优的 InterCTC 设置。然后，使用 E-branchfinder 编码器训练了一个模型，并比较了两种架构的性能。为语言模型（LM）采用多任务微调。实验结果表明，与基线 ECAPA-TDNN相比，DID 准确度提高了 10.8%，而 WER 性能接近于 TDNN-HMM 模型。这种多任务方法在爱尔兰低资源 ASR 和 DID 任务中具有前景。

URL

https://arxiv.org/abs/2405.01293

PDF

https://arxiv.org/pdf/2405.01293.pdf

Low-resource speech recognition and dialect identification of Irish in a multi-task framework

Abstract

Abstract (translated)

URL

PDF Copy

PDF