Abstract
We propose a Transformer-based approach for information extraction from digitized handwritten documents. Our approach combines, in a single model, the different steps that were so far performed by separate models: feature extraction, handwriting recognition and named entity recognition. We compare this integrated approach with traditional two-stage methods that perform handwriting recognition before named entity recognition, and present results at different levels: line, paragraph, and page. Our experiments show that attention-based models are especially interesting when applied on full pages, as they do not require any prior segmentation step. Finally, we show that they are able to learn from key-value annotations: a list of important words with their corresponding named entities. We compare our models to state-of-the-art methods on three public databases (IAM, ESPOSALLES, and POPP) and outperform previous performances on all three datasets.
Abstract (translated)
我们提出了从数字化手写文档中提取信息的Transformer-based方法。我们的方法将 separate 模型此前完成的不同步骤整合到一个模型中:特征提取,手写命名实体识别和两阶段方法(手写命名实体识别之前)。我们将这种集成方法与传统的二阶段方法进行比较,在手写命名实体识别之前进行特征提取,并在不同水平上呈现结果:行、段落和页面。我们的实验结果表明,当应用于整页时,注意力模型特别有趣,因为它们不需要任何前分片步骤。最后,我们展示了它们能够从关键值注释学习:一个重要的单词及其相应的命名实体列表。我们在三个公共数据库(IAM、ESPOSALLES和POPP)上比较了我们的模型与最先进的方法,并在所有三个数据集上优于以前的性能表现。
URL
https://arxiv.org/abs/2304.13530