Abstract
Vocal education in the music field is difficult to quantify due to the individual differences in singers' voices and the different quantitative criteria of singing techniques. Deep learning has great potential to be applied in music education due to its efficiency to handle complex data and perform quantitative analysis. However, accurate evaluations with limited samples over rare vocal types, such as Mezzo-soprano, requires extensive well-annotated data support using deep learning models. In order to attain the objective, we perform transfer learning by employing deep learning models pre-trained on the ImageNet and Urbansound8k datasets for the improvement on the precision of vocal technique evaluation. Furthermore, we tackle the problem of the lack of samples by constructing a dedicated dataset, the Mezzo-soprano Vocal Set (MVS), for vocal technique assessment. Our experimental results indicate that transfer learning increases the overall accuracy (OAcc) of all models by an average of 8.3%, with the highest accuracy at 94.2%. We not only provide a novel approach to evaluating Mezzo-soprano vocal techniques but also introduce a new quantitative assessment method for music education.
Abstract (translated)
声乐教育在音乐领域很难量化,因为歌手的声音个体差异很大,而且演唱技巧的定量标准也各不相同。深度学习因其处理复杂数据和进行定量分析的高效性,在音乐教育中具有很大的应用潜力。然而,对于像女中音这样罕见的嗓音类型,仅凭有限样本实现准确评估需要大量的标注良好的数据支持使用深度学习模型。为了达到这一目标,我们通过在ImageNet和Urbansound8k数据集上预训练的深度学习模型实施迁移学习,以提高声乐技巧评估的准确性。此外,为了解决样本不足的问题,我们构建了一个专门的数据集——女中音声乐集(MVS),用于声乐技术评估。实验结果表明,迁移学习将所有模型的整体准确率平均提高了8.3%,最高准确率达到94.2%。我们不仅提供了一种评价女中音演唱技巧的新方法,还为音乐教育引入了新的定量评估手段。
URL
https://arxiv.org/abs/2410.23325