Paper Reading AI Learner

A Scalable Handwritten Text Recognition System

2019-04-19 11:35:27
R. Reeve Ingle, Yasuhisa Fujii, Thomas Deselaers, Jonathan Baccash, Ashok C. Popat

Abstract

Many studies on (Offline) Handwritten Text Recognition (HTR) systems have focused on building state-of-the-art models for line recognition on small corpora. However, adding HTR capability to a large scale multilingual OCR system poses new challenges. This paper addresses three problems in building such systems: data, efficiency, and integration. Firstly, one of the biggest challenges is obtaining sufficient amounts of high quality training data. We address the problem by using online handwriting data collected for a large scale production online handwriting recognition system. We describe our image data generation pipeline and study how online data can be used to build HTR models. We show that the data improve the models significantly under the condition where only a small number of real images is available, which is usually the case for HTR models. It enables us to support a new script at substantially lower cost. Secondly, we propose a line recognition model based on neural networks without recurrent connections. The model achieves a comparable accuracy with LSTM-based models while allowing for better parallelism in training and inference. Finally, we present a simple way to integrate HTR models into an OCR system. These constitute a solution to bring HTR capability into a large scale OCR system.

Abstract (translated)

许多关于(离线)手写文本识别(HTR)系统的研究都集中在建立最先进的小语料库行识别模型上。然而,在大规模的多语言OCR系统中增加HTR功能带来了新的挑战。本文讨论了在构建这样的系统中的三个问题:数据、效率和集成。首先,最大的挑战之一是获取足够数量的高质量培训数据。我们使用为大规模生产的在线手写识别系统收集的在线手写数据来解决这个问题。我们描述了我们的图像数据生成管道,并研究了如何利用在线数据构建HTR模型。结果表明,在只有少量真实图像可用的情况下,数据对模型有显著的改善,这通常是HTR模型的情况。它使我们能够以更低的成本支持一个新脚本。其次,提出了一种基于神经网络的无重复连接线识别模型。该模型与基于LSTM的模型具有可比的精度,同时允许在训练和推理中实现更好的并行性。最后,我们提出了一种将HTR模型集成到OCR系统中的简单方法。这些都是将HTR能力引入大规模OCR系统的解决方案。

URL

https://arxiv.org/abs/1904.09150

PDF

https://arxiv.org/pdf/1904.09150.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot