Abstract
Dyslexia affects reading and writing skills across many languages. This work describes a new application of YOLO-based object detection to isolate and label handwriting patterns (Normal, Reversal, Corrected) within synthetic images that resemble real words. Individual letters are first collected, preprocessed into 32x32 samples, then assembled into larger synthetic 'words' to simulate realistic handwriting. Our YOLOv11 framework simultaneously localizes each letter and classifies it into one of three categories, reflecting key dyslexia traits. Empirically, we achieve near-perfect performance, with precision, recall, and F1 metrics typically exceeding 0.999. This surpasses earlier single-letter approaches that rely on conventional CNNs or transfer-learning classifiers (for example, MobileNet-based methods in Robaa et al. arXiv:2410.19821). Unlike simpler pipelines that consider each letter in isolation, our solution processes complete word images, resulting in more authentic representations of handwriting. Although relying on synthetic data raises concerns about domain gaps, these experiments highlight the promise of YOLO-based detection for faster and more interpretable dyslexia screening. Future work will expand to real-world handwriting, other languages, and deeper explainability methods to build confidence among educators, clinicians, and families.
Abstract (translated)
阅读障碍会影响多语言的读写技能。这项工作描述了一种基于YOLO(You Only Look Once)目标检测的新应用,该应用旨在从类似于真实单词的合成图像中分离和标记书写模式(正常、反转、修正)。首先收集单个字母,预处理为32x32样本,然后组装成更大的合成“单词”,以模拟真实的书写方式。我们的YOLOv11框架同时定位每个字母并将其分类到三个类别之一,反映关键的阅读障碍特征。从经验上看,我们达到了接近完美的性能,精度、召回率和F1指标通常超过0.999。这超过了依赖传统CNN(卷积神经网络)或迁移学习分类器(例如Robaa等人提出的基于MobileNet的方法 arXiv:2410.19821)的早期单字母方法。与只考虑每个字母的简单流程不同,我们的解决方案处理完整的单词图像,从而生成更真实的书写表示形式。尽管依赖于合成数据会引发领域差距的问题,但这些实验突显了基于YOLO检测在阅读障碍筛查中实现更快和更具解释性的潜力。未来的工作将扩展到现实世界中的手写、其他语言以及更深的可解释性方法,以增强教育者、临床医生和家庭的信心。
URL
https://arxiv.org/abs/2501.15263