Abstract
This paper addresses the challenges and advancements in speech recognition for singing, a domain distinctly different from standard speech recognition. Singing encompasses unique challenges, including extensive pitch variations, diverse vocal styles, and background music interference. We explore key areas such as phoneme recognition, language identification in songs, keyword spotting, and full lyrics transcription. I will describe some of my own experiences when performing research on these tasks just as they were starting to gain traction, but will also show how recent developments in deep learning and large-scale datasets have propelled progress in this field. My goal is to illuminate the complexities of applying speech recognition to singing, evaluate current capabilities, and outline future research directions.
Abstract (translated)
本文探讨了 singing领域中语音识别(speech recognition)的挑战和进展。与标准语音识别领域不同,唱歌领域具有独特的挑战,包括广泛的音高变化、多样化的歌唱风格和背景音乐干扰。我们探讨了关键领域,如音素识别、歌曲中的语言识别、关键词捕捉和完整歌词转录。我将在讨论这些任务开始得到广泛关注时描述我自己的一些经验,但也会展示深度学习和大规模数据集的最近发展如何推动这一领域的发展。我的目标是阐明将语音识别应用于唱歌的复杂性,评估现有能力,并指出未来的研究方向。
URL
https://arxiv.org/abs/2403.09298