Paper Reading AI Learner

How to Choose Pretrained Handwriting Recognition Models for Single Writer Fine-Tuning

2023-05-04 07:00:28
Vittorio Pippi, Silvia Cascianelli, Christopher Kermorvant, Rita Cucchiara

Abstract

Recent advancements in Deep Learning-based Handwritten Text Recognition (HTR) have led to models with remarkable performance on both modern and historical manuscripts in large benchmark datasets. Nonetheless, those models struggle to obtain the same performance when applied to manuscripts with peculiar characteristics, such as language, paper support, ink, and author handwriting. This issue is very relevant for valuable but small collections of documents preserved in historical archives, for which obtaining sufficient annotated training data is costly or, in some cases, unfeasible. To overcome this challenge, a possible solution is to pretrain HTR models on large datasets and then fine-tune them on small single-author collections. In this paper, we take into account large, real benchmark datasets and synthetic ones obtained with a styled Handwritten Text Generation model. Through extensive experimental analysis, also considering the amount of fine-tuning lines, we give a quantitative indication of the most relevant characteristics of such data for obtaining an HTR model able to effectively transcribe manuscripts in small collections with as little as five real fine-tuning lines.

Abstract (translated)

近年来,深度学习基于手写文本识别(HTR)的发展已经取得了巨大的进步,在大型基准数据集上取得了卓越的现代和历史手稿识别性能。然而,当应用于具有独特特征的手稿,例如语言、纸张支持、墨水和作者手写字体时,这些模型却无法达到相同的表现水平。这个问题对于保存的宝贵但小型文档集合,获取足够的注释训练数据非常昂贵或在某些情况下不可能。为了克服这一挑战,一种可能的解决方案是在大型基准数据集上预训练HTR模型,然后在小 single-author 集合上微调它们。在本文中,我们考虑了大型真实基准数据集和通过手写文本生成模型样式生成的模拟数据集。通过广泛的实验分析,并考虑微调线的数量和精度,我们提供了这种数据的最相关特征的定量指示,以构建能够在小数据集上有效解码手稿的HTR模型,仅包含五行真正的微调线。

URL

https://arxiv.org/abs/2305.02593

PDF

https://arxiv.org/pdf/2305.02593.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot