Paper Reading AI Learner

Enhancement of text recognition for hanja handwritten documents of Ancient Korea

2024-12-14 02:29:07
Joonmo Ahna, Taehong Jang, Quan Fengnyu, Hyungil Lee, Jaehyuk Lee, Sojung Lucia Kim

Abstract

We implemented a high-performance optical character recognition model for classical handwritten documents using data augmentation with highly variable cropping within the document region. Optical character recognition in handwritten documents, especially classical documents, has been a challenging topic in many countries and research organizations due to its difficulty. Although many researchers have conducted research on this topic, the quality of classical texts over time and the unique stylistic characteristics of various authors have made it difficult, and it is clear that the recognition of hanja handwritten documents is a meaningful and special challenge, especially since hanja, which has been developed by reflecting the vocabulary, semantic, and syntactic features of the Joseon Dynasty, is different from classical Chinese characters. To study this challenge, we used 1100 cursive documents, which are small in size, and augmented 100 documents per document by cropping a randomly sized region within each document for training, and trained them using a two-stage object detection model, High resolution neural network (HRNet), and applied the resulting model to achieve a high inference recognition rate of 90% for cursive documents. Through this study, we also confirmed that the performance of OCR is affected by the simplified characters, variants, variant characters, common characters, and alternators of Chinese characters that are difficult to see in other studies, and we propose that the results of this study can be applied to optical character recognition of modern documents in multiple languages as well as other typefaces in classical documents.

Abstract (translated)

我们通过在文档区域内使用高度变化的裁剪进行数据增强,实现了一个高性能的手写光学字符识别模型,专门用于古典手写文件。手写文档中的光学字符识别,特别是古典文档,一直是许多国家和研究机构面临的难题。尽管许多研究人员已经在这个话题上进行了研究,但由于时间对古典文本质量的影响以及不同作者的独特风格特征,这个问题变得尤为困难。尤其是对于由朝鲜王朝的词汇、语义和句法特点发展而来的汉字手写文档识别来说,这是一项有意义且特殊的挑战,因为汉字与传统的汉字有所不同。为了应对这一挑战,我们使用了1100份小尺寸的草书文件,并通过随机裁剪每个文档内的区域来增强每份文档至100份进行训练。采用两阶段对象检测模型和高分辨率神经网络(HRNet)对其进行训练,并将该模型应用于实现对草书文档90%的推理识别率。通过这项研究,我们也确认了OCR性能受到汉字简化字、异体字、变体字、常用字以及替换字的影响,这些在其他研究中难以观察到的特点。我们建议,本研究的结果可以用于多种语言的现代文档光学字符识别以及其他类型的古典文档中。

URL

https://arxiv.org/abs/2412.10647

PDF

https://arxiv.org/pdf/2412.10647.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot