Abstract
Automated speaking assessment (ASA) typically involves automatic speech recognition (ASR) and hand-crafted feature extraction from the ASR transcript of a learner's speech. Recently, self-supervised learning (SSL) has shown stellar performance compared to traditional methods. However, SSL-based ASA systems are faced with at least three data-related challenges: limited annotated data, uneven distribution of learner proficiency levels and non-uniform score intervals between different CEFR proficiency levels. To address these challenges, we explore the use of two novel modeling strategies: metric-based classification and loss reweighting, leveraging distinct SSL-based embedding features. Extensive experimental results on the ICNALE benchmark dataset suggest that our approach can outperform existing strong baselines by a sizable margin, achieving a significant improvement of more than 10% in CEFR prediction accuracy.
Abstract (translated)
自动口语评估(ASA)通常涉及自动语音识别(ASR)和从学习者的语音ASR转录中手工提取特征。近年来,自监督学习(SSL)在传统方法中表现出优异性能。然而,基于SSL的ASA系统面临至少三个数据相关挑战:有限的标注数据、学习者水平分布不均以及不同CEFR水平之间的分数间隔非均匀。为了应对这些挑战,我们探讨了使用两种新颖的建模策略:基于指标的分类和损失加权,利用独特的SSL基体特征。在ICNALE基准数据集上的广泛实验结果表明,我们的方法可以显著优于现有强大的基线,实现CEFR预测准确性的提高超过10%。
URL
https://arxiv.org/abs/2404.07575