Paper Reading AI Learner

Calamari - A High-Performance Tensorflow-based Deep Learning Package for Optical Character Recognition

2018-07-05 13:46:37
Christoph Wick, Christian Reul, Frank Puppe

Abstract

Optical Character Recognition (OCR) on contemporary and historical data is still in the focus of many researchers. Especially historical prints require book specific trained OCR models to achieve applicable results (Springmann and L\"udeling, 2016, Reul et al., 2017a). To reduce the human effort for manually annotating ground truth (GT) various techniques such as voting and pretraining have shown to be very efficient (Reul et al., 2018a, Reul et al., 2018b). Calamari is a new open source OCR line recognition software that both uses state-of-the art Deep Neural Networks (DNNs) implemented in Tensorflow and giving native support for techniques such as pretraining and voting. The customizable network architectures constructed of Convolutional Neural Networks (CNNS) and Long-ShortTerm-Memory (LSTM) layers are trained by the so-called Connectionist Temporal Classification (CTC) algorithm of Graves et al. (2006). Optional usage of a GPU drastically reduces the computation times for both training and prediction. We use two different datasets to compare the performance of Calamari to OCRopy, OCRopus3, and Tesseract 4. Calamari reaches a Character Error Rate (CER) of 0.11% on the UW3 dataset written in modern English and 0.18% on the DTA19 dataset written in German Fraktur, which considerably outperforms the results of the existing softwares.

Abstract (translated)

关于当代和历史数据的光学字符识别(OCR)仍然是许多研究人员关注的焦点。特别是历史版画需要书籍特定的训练OCR模型才能达到适用的结果(Springmann和L \“udeling,2016,Reul等,2017a)。减少人工注意地面实况(GT)的各种技巧,如投票和预训练已经证明是非常有效的(Reul等,2018a,Reul等,2018b).Calamari是一种新的开源OCR线识别软件,它使用了最先进的深度神经网络(DNN)。 Tensorflow并为预训练和投票等技术提供原生支持。由卷积神经网络(CNNS)和长短时间内存(LSTM)层构成的可定制网络架构通过所谓的连接主义时间分类(CTC)算法进行训练。 Graves等人(2006).GPU的可选用法大大减少了训练和预测的计算时间。我们使用两个不同的数据集来比较Calamari与OCRopy,OCRopus3和Tesseract 4. Calamari在用现代英语编写的UW3数据集上达到0.11%的字符错误率(CER),在用德语Fraktur编写的DTA19数据集上达到0.18%,其显着优于现有软件的结果。

URL

https://arxiv.org/abs/1807.02004

PDF

https://arxiv.org/pdf/1807.02004.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot