Abstract
The digitization of historical manuscripts presents significant challenges for Handwritten Text Recognition (HTR) systems, particularly when dealing with small, author-specific collections that diverge from the training data distributions. Handwritten Text Generation (HTG) techniques, which generate synthetic data tailored to specific handwriting styles, offer a promising solution to address these challenges. However, the effectiveness of various HTG models in enhancing HTR performance, especially in low-resource transcription settings, has not been thoroughly evaluated. In this work, we systematically compare three state-of-the-art styled HTG models (representing the generative adversarial, diffusion, and autoregressive paradigms for HTG) to assess their impact on HTR fine-tuning. We analyze how visual and linguistic characteristics of synthetic data influence fine-tuning outcomes and provide quantitative guidelines for selecting the most effective HTG model. The results of our analysis provide insights into the current capabilities of HTG methods and highlight key areas for further improvement in their application to low-resource HTR.
Abstract (translated)
历史手稿的数字化对于手写文本识别(HTR)系统带来了显著挑战,特别是在处理与训练数据分布差异较大的小型、特定作者的手稿集合时。生成符合特定书写风格的合成数据的手写文本生成(HTG)技术为解决这些挑战提供了有前景的方法。然而,各种HTG模型在提升资源匮乏环境下的HTR性能方面效果如何尚未得到充分评估。 本文系统地比较了三种最先进的具有风格特性的HTG模型(代表HTG中的生成对抗、扩散和自回归范式),以评估它们对HTR微调的影响。我们分析了合成数据的视觉和语言特征如何影响微调结果,并提供了选择最有效HTG模型的定量指导原则。我们的研究结果揭示了当前HTG方法的能力,强调了在低资源HTR应用中进一步改进的关键领域。
URL
https://arxiv.org/abs/2508.09936