Abstract
We present the findings of the sixth Nuanced Arabic Dialect Identification (NADI 2025) Shared Task, which focused on Arabic speech dialect processing across three subtasks: spoken dialect identification (Subtask 1), speech recognition (Subtask 2), and diacritic restoration for spoken dialects (Subtask 3). A total of 44 teams registered, and during the testing phase, 100 valid submissions were received from eight unique teams. The distribution was as follows: 34 submissions for Subtask 1 "five teamsæ, 47 submissions for Subtask 2 "six teams", and 19 submissions for Subtask 3 "two teams". The best-performing systems achieved 79.8% accuracy on Subtask 1, 35.68/12.20 WER/CER (overall average) on Subtask 2, and 55/13 WER/CER on Subtask 3. These results highlight the ongoing challenges of Arabic dialect speech processing, particularly in dialect identification, recognition, and diacritic restoration. We also summarize the methods adopted by participating teams and briefly outline directions for future editions of NADI.
Abstract (translated)
我们在此介绍第六届细腻阿拉伯方言识别(NADI 2025)共享任务的研究成果,该任务集中在三项子任务上:口头方言识别(子任务1)、语音识别(子任务2)和口语方言的标点恢复(子任务3)。共有44支队伍注册参加,测试阶段收到了来自8支不同团队的有效提交共计100份。具体分布如下:子任务1“五队”收到34份提交,子任务2“六队”收到47份提交,以及子任务3“两队”收到19份提交。表现最佳的系统在子任务1中达到了79.8%的准确率,在子任务2中实现了35.68/12.20 WER/CER(总体平均值),而在子任务3中则为55/13 WER/CER。这些结果突显了阿拉伯方言语音处理在方言识别、识别和标点恢复方面持续存在的挑战。我们还总结了参赛队伍采用的方法,并简要概述了未来NADI版本的发展方向。
URL
https://arxiv.org/abs/2509.02038