Paper Reading AI Learner

Writer adaptation for offline text recognition: An exploration of neural network-based methods

2023-07-11 11:35:08
Tobias van der Werff, Maruf A. Dhali, Lambert Schomaker

Abstract

Handwriting recognition has seen significant success with the use of deep learning. However, a persistent shortcoming of neural networks is that they are not well-equipped to deal with shifting data distributions. In the field of handwritten text recognition (HTR), this shows itself in poor recognition accuracy for writers that are not similar to those seen during training. An ideal HTR model should be adaptive to new writing styles in order to handle the vast amount of possible writing styles. In this paper, we explore how HTR models can be made writer adaptive by using only a handful of examples from a new writer (e.g., 16 examples) for adaptation. Two HTR architectures are used as base models, using a ResNet backbone along with either an LSTM or Transformer sequence decoder. Using these base models, two methods are considered to make them writer adaptive: 1) model-agnostic meta-learning (MAML), an algorithm commonly used for tasks such as few-shot classification, and 2) writer codes, an idea originating from automatic speech recognition. Results show that an HTR-specific version of MAML known as MetaHTR improves performance compared to the baseline with a 1.4 to 2.0 improvement in word error rate (WER). The improvement due to writer adaptation is between 0.2 and 0.7 WER, where a deeper model seems to lend itself better to adaptation using MetaHTR than a shallower model. However, applying MetaHTR to larger HTR models or sentence-level HTR may become prohibitive due to its high computational and memory requirements. Lastly, writer codes based on learned features or Hinge statistical features did not lead to improved recognition performance.

Abstract (translated)

手写文本识别利用深度学习取得了显著的成功。然而,神经网络的一个长期缺陷是它们无法处理不断变化的数据分布。在手写文本识别领域(HTR),这种现象表现为对新的写作风格识别准确率较差,与训练期间看到的写作风格不同。理想的HTR模型应该适应新的写作风格,以处理大量的可能写作风格。在本文中,我们探讨了如何使用仅来自一名新作者的几个例子(例如16个例子)来适应HTR模型,如何使HTR模型作家自适应:1)模型无关的元学习(MAML),是一种常见的任务,例如少量分类,以及2)写作代码,这是一个源自自动语音识别的想法。结果显示,称为MetaHTR的HTR特定版本相比基线表现提高了性能,单词错误率(WER)下降了1.4到2.0。由于作家适应引起的改善位于0.2到0.7的WER之间, deeper模型似乎更好地使用MetaHTR进行适应,而 shallower模型则更适合。然而,将MetaHTR应用于更大的HTR模型或句子级别的HTR可能由于高计算和内存要求而令人难以实施。最后,基于学习特征或梯度统计特征的写作代码并未导致更好的识别性能。

URL

https://arxiv.org/abs/2307.15071

PDF

https://arxiv.org/pdf/2307.15071.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot