Abstract
In recent years, ASR systems have reached remarkable performance on specific tasks for which sufficient amounts of training data are available, like e.g. LibriSpeech. However, varying acoustic and recording conditions and speaking styles and a lack of sufficient in-domain training data still pose challenges to the development of accurate models. In this work, we present our efforts for the development of ASR systems for a conversational telephone speech translation task in the medical domain for three languages (Arabic, German, Vietnamese) to support emergency room interaction between physician and patient across language barriers. We study different training schedules and data combination approaches in order to improve the system's performance, as well as analyze where limited available data is used most efficiently.
Abstract (translated)
URL
https://arxiv.org/abs/2210.13397