Abstract
Automatic speech recognition (ASR) systems promise to deliver objective interpretation of human speech. Practice and recent evidence suggests that the state-of-the-art (SotA) ASRs struggle with speech variance due to gender, age, speech impairment, race, and accents. Many factors can cause the bias of an ASR system, e.g. composition of the training material and articulation differences. Our overarching goal is to uncover bias in ASR systems to work towards proactive bias mitigation in ASR. This paper systematically quantifies the bias of a SotA ASR system against gender, age, regional accents and non-native accents. Word error rates are compared, and in-depth phoneme-level error analysis is conducted to understand where bias is occurring. We focus on bias due to articulation differences in the dataset. Based on our findings, we suggest bias mitigation strategies for ASR development.
Abstract (translated)
URL
https://arxiv.org/abs/2103.15122