Abstract
Optical character recognition (OCR) is a vital process that involves the extraction of handwritten or printed text from scanned or printed images, converting it into a format that can be understood and processed by machines. This enables further data processing activities such as searching and editing. The automatic extraction of text through OCR plays a crucial role in digitizing documents, enhancing productivity, improving accessibility, and preserving historical records. This paper seeks to offer an exhaustive review of contemporary applications, methodologies, and challenges associated with Arabic Optical Character Recognition (OCR). A thorough analysis is conducted on prevailing techniques utilized throughout the OCR process, with a dedicated effort to discern the most efficacious approaches that demonstrate enhanced outcomes. To ensure a thorough evaluation, a meticulous keyword-search methodology is adopted, encompassing a comprehensive analysis of articles relevant to Arabic OCR, including both backward and forward citation reviews. In addition to presenting cutting-edge techniques and methods, this paper critically identifies research gaps within the realm of Arabic OCR. By highlighting these gaps, we shed light on potential areas for future exploration and development, thereby guiding researchers toward promising avenues in the field of Arabic OCR. The outcomes of this study provide valuable insights for researchers, practitioners, and stakeholders involved in Arabic OCR, ultimately fostering advancements in the field and facilitating the creation of more accurate and efficient OCR systems for the Arabic language.
Abstract (translated)
光学字符识别(OCR)是一个关键的过程,涉及从扫描或打印图像中提取手写或印刷文本,并将其转换为可以被机器理解和处理的形式。这使得进一步的数据处理活动成为可能,例如搜索和编辑。通过OCR自动提取文本在数字化文档、提高生产率、改善可访问性以及保护历史记录中发挥了关键作用。本文旨在对当代阿拉伯语OCR应用程序、方法和挑战进行全面回顾。对OCR过程中采用的现有技术的深入分析,包括对与阿拉伯语OCR相关的文章的全面回顾,以确保彻底评估。此外,本文重点关注研究领域的空白,从而引导研究人员朝着该领域有前景的方向发展。本研究的结果为阿拉伯语OCR的研究人员、实践者和利益相关者提供了宝贵的洞见,从而推动了该领域的发展,并为阿拉伯语言创建更准确、更有效的OCR系统奠定了基础。
URL
https://arxiv.org/abs/2312.11812