Svarah: Evaluating English ASR Systems on Indian Accents

2023-05-25 06:20:29

Tahir Javed, Sakshi Joshi, Vignesh Nagarajan, Sai Sundaresan, Janki Nawale, Abhigyan Raman, Kaushal Bhogale, Pratyush Kumar, Mitesh M. Khapra

arXiv_SD

Abstract
Abstract (translated)
URL
PDF

Abstract

India is the second largest English-speaking country in the world with a speaker base of roughly 130 million. Thus, it is imperative that automatic speech recognition (ASR) systems for English should be evaluated on Indian accents. Unfortunately, Indian speakers find a very poor representation in existing English ASR benchmarks such as LibriSpeech, Switchboard, Speech Accent Archive, etc. In this work, we address this gap by creating Svarah, a benchmark that contains 9.6 hours of transcribed English audio from 117 speakers across 65 geographic locations throughout India, resulting in a diverse range of accents. Svarah comprises both read speech and spontaneous conversational data, covering various domains, such as history, culture, tourism, etc., ensuring a diverse vocabulary. We evaluate 6 open source ASR models and 2 commercial ASR systems on Svarah and show that there is clear scope for improvement on Indian accents. Svarah as well as all our code will be publicly available.

Abstract (translated)

印度是世界上第二大英语国家，拥有大约1300万人的说话人口。因此，必须评估英语自动语音识别系统(ASR)的性能，以印度口音为基准。不幸的是，印度说话者发现在现有的英语ASR基准如LibriSpeech、Switchboard、Speech Accent Archive等中，他们的表现非常差。在这个工作中，我们解决这个问题的方法是创建Svarah，它是一个基准，包含了在印度全国65个地理区域中的117名说话者录制的9.6小时的英语音频，产生了各种口音。Svarah包括阅读 speech 和自发对话数据，涵盖了各种领域，例如历史、文化、旅游等，确保具有多种词汇。我们评估了6个开源ASR模型和2个商业ASR系统在Svarah上，并表明印度口音有改善的空间。Svarah和我们的所有代码都将公开可用。

URL

https://arxiv.org/abs/2305.15760

PDF

https://arxiv.org/pdf/2305.15760.pdf