ASR advancements for indigenous languages: Quechua, Guarani, Bribri, Kotiria, and Wa'ikhana

Abstract
Abstract (translated)
URL
PDF

Abstract

Indigenous languages are a fundamental legacy in the development of human communication, embodying the unique identity and culture of local communities of America. The Second AmericasNLP Competition Track 1 of NeurIPS 2022 proposed developing automatic speech recognition (ASR) systems for five indigenous languages: Quechua, Guarani, Bribri, Kotiria, and Wa'ikhana. In this paper, we propose a reliable ASR model for each target language by crawling speech corpora spanning diverse sources and applying data augmentation methods that resulted in the winning approach in this competition. To achieve this, we systematically investigated the impact of different hyperparameters by a Bayesian search on the performance of the language models, specifically focusing on the variants of the Wav2vec2.0 XLS-R model: 300M and 1B parameters. Moreover, we performed a global sensitivity analysis to assess the contribution of various hyperparametric configurations to the performances of our best models. Importantly, our results show that freeze fine-tuning updates and dropout rate are more vital parameters than the total number of epochs of lr. Additionally, we liberate our best models -- with no other ASR model reported until now for two Wa'ikhana and Kotiria -- and the many experiments performed to pave the way to other researchers to continue improving ASR in minority languages. This insight opens up interesting avenues for future work, allowing for the advancement of ASR techniques in the preservation of minority indigenous and acknowledging the complexities involved in this important endeavour.

Abstract (translated)

土著语言是人类交流发展的重要遗产，体现了美国各地社区的独特身份和文化。2022年NeurIPS第二天的NLP竞赛赛道1提出为五种土著语言开发自动语音识别（ASR）系统：库亚（Quechua）、瓜拉尼（Guarani）、布里比（Bribri）、科托利亚（Kotiria）和瓦伊克哈纳（Wa'ikhana）。在本文中，我们通过爬取跨度广泛的语音数据集并应用竞赛中的最佳方法，提出了可靠的ASR模型，用于每个目标语言。为了实现这一目标，我们系统地研究了不同超参数对语言模型性能的影响，特别关注Wav2vec2.0 XLS-R模型的两个变体：300M和1B参数。此外，我们进行了全局敏感性分析，以评估各种超参数配置对最佳模型的性能贡献。重要的是，我们的结果表明，静止微调更新和 dropout 率比学习率的总迭代次数更加重要。此外，我们还发布了之前没有报道过的最好的模型 -- 直到现在只有两个Wa'ikhana和Kotiria模型被报道过 -- 以及为了其他研究人员继续改进亚索语言而进行的许多实验。这一洞察为未来工作打开了有趣的途径，允许在保护少数民族土著语言方面推动ASR技术的发展，并承认这一重要任务中涉及的复杂性。

URL

https://arxiv.org/abs/2404.08368

PDF

https://arxiv.org/pdf/2404.08368.pdf

ASR advancements for indigenous languages: Quechua, Guarani, Bribri, Kotiria, and Wa'ikhana

Abstract

Abstract (translated)

URL

PDF Copy

PDF