Paper Reading AI Learner

Explainable YOLO-Based Dyslexia Detection in Synthetic Handwriting Data

2025-01-25 16:14:16
Nora Fink

Abstract

Dyslexia affects reading and writing skills across many languages. This work describes a new application of YOLO-based object detection to isolate and label handwriting patterns (Normal, Reversal, Corrected) within synthetic images that resemble real words. Individual letters are first collected, preprocessed into 32x32 samples, then assembled into larger synthetic 'words' to simulate realistic handwriting. Our YOLOv11 framework simultaneously localizes each letter and classifies it into one of three categories, reflecting key dyslexia traits. Empirically, we achieve near-perfect performance, with precision, recall, and F1 metrics typically exceeding 0.999. This surpasses earlier single-letter approaches that rely on conventional CNNs or transfer-learning classifiers (for example, MobileNet-based methods in Robaa et al. arXiv:2410.19821). Unlike simpler pipelines that consider each letter in isolation, our solution processes complete word images, resulting in more authentic representations of handwriting. Although relying on synthetic data raises concerns about domain gaps, these experiments highlight the promise of YOLO-based detection for faster and more interpretable dyslexia screening. Future work will expand to real-world handwriting, other languages, and deeper explainability methods to build confidence among educators, clinicians, and families.

Abstract (translated)

阅读障碍会影响多语言的读写技能。这项工作描述了一种基于YOLO(You Only Look Once)目标检测的新应用,该应用旨在从类似于真实单词的合成图像中分离和标记书写模式(正常、反转、修正)。首先收集单个字母,预处理为32x32样本,然后组装成更大的合成“单词”,以模拟真实的书写方式。我们的YOLOv11框架同时定位每个字母并将其分类到三个类别之一,反映关键的阅读障碍特征。从经验上看,我们达到了接近完美的性能,精度、召回率和F1指标通常超过0.999。这超过了依赖传统CNN(卷积神经网络)或迁移学习分类器(例如Robaa等人提出的基于MobileNet的方法 arXiv:2410.19821)的早期单字母方法。与只考虑每个字母的简单流程不同,我们的解决方案处理完整的单词图像,从而生成更真实的书写表示形式。尽管依赖于合成数据会引发领域差距的问题,但这些实验突显了基于YOLO检测在阅读障碍筛查中实现更快和更具解释性的潜力。未来的工作将扩展到现实世界中的手写、其他语言以及更深的可解释性方法,以增强教育者、临床医生和家庭的信心。

URL

https://arxiv.org/abs/2501.15263

PDF

https://arxiv.org/pdf/2501.15263.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot