Abstract
This paper is concerned with automatic continuous speech recognition using trainable systems. The aim of this work is to build acoustic models for spoken Swedish. This is done employing hidden Markov models and using the SpeechDat database to train their parameters. Acoustic modeling has been worked out at a phonetic level, allowing general speech recognition applications, even though a simplified task (digits and natural number recognition) has been considered for model evaluation. Different kinds of phone models have been tested, including context independent models and two variations of context dependent models. Furthermore many experiments have been done with bigram language models to tune some of the system parameters. System performance over various speaker subsets with different sex, age and dialect has also been examined. Results are compared to previous similar studies showing a remarkable improvement.
Abstract (translated)
本文关注使用可训练系统进行自动连续语音识别。本工作的目标是建立用于 spoken Swedish 的音频模型。这是通过使用隐马尔可夫模型,并使用SpeechDat数据库训练其参数来实现的。在音位级别上进行了语音建模,允许进行通用语音识别应用,尽管对于模型评估,简化任务(数字和自然数识别)已经被考虑。已经测试了不同类型的电话模型,包括上下文无关模型和上下文相关模型的两种变体。此外,还与大词本语言模型一起进行了很多实验,以调整一些系统参数。还检查了系统在不同说话者子集上的性能,包括不同性别、年龄和方言。结果与之前类似的研究相比显示出显著的改进。
URL
https://arxiv.org/abs/2404.16547