Paper Reading AI Learner

Killkan: The Automatic Speech Recognition Dataset for Kichwa with Morphosyntactic Information

2024-04-23 20:26:07
Chihiro Taguchi, Jefferson Saransig, Dayana Velásquez, David Chiang


This paper presents Killkan, the first dataset for automatic speech recognition (ASR) in the Kichwa language, an indigenous language of Ecuador. Kichwa is an extremely low-resource endangered language, and there have been no resources before Killkan for Kichwa to be incorporated in applications of natural language processing. The dataset contains approximately 4 hours of audio with transcription, translation into Spanish, and morphosyntactic annotation in the format of Universal Dependencies. The audio data was retrieved from a publicly available radio program in Kichwa. This paper also provides corpus-linguistic analyses of the dataset with a special focus on the agglutinative morphology of Kichwa and frequent code-switching with Spanish. The experiments show that the dataset makes it possible to develop the first ASR system for Kichwa with reliable quality despite its small dataset size. This dataset, the ASR model, and the code used to develop them will be publicly available. Thus, our study positively showcases resource building and its applications for low-resource languages and their community.

Abstract (translated)

本文介绍了Killkan,这是库奇华语(Kichwa)的第一份自动语音识别(ASR)数据集,这是一种来自厄瓜多尔的土著语言。库奇华是一种极其缺乏资源、濒临灭绝的语言,以前没有库奇华的资源被融入到自然语言处理应用程序中。数据集包含近4小时的音频转录、西班牙语翻译和语素形态学注释的格式为Universal Dependencies。音频数据是从库奇华的一个公开可用的无线电节目提取的。本文还重点分析了数据集的语料库语义分析,特别关注库奇华的粘着形态和与西班牙语的频繁代码转换。实验结果表明,尽管数据集规模较小,但该数据集还是可以开发出库奇华语的第一份ASR系统,具有可靠的质量和效果。这个数据集、ASR模型和用于开发它们的代码将公开发布。因此,我们的研究正面展示了资源建设和它们对低资源语言及其社区的启示。



3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot