GenKIE: Robust Generative Multimodal Document Key Information Extraction

Abstract
Abstract (translated)
URL
PDF

Abstract

Key information extraction (KIE) from scanned documents has gained increasing attention because of its applications in various domains. Although promising results have been achieved by some recent KIE approaches, they are usually built based on discriminative models, which lack the ability to handle optical character recognition (OCR) errors and require laborious token-level labelling. In this paper, we propose a novel generative end-to-end model, named GenKIE, to address the KIE task. GenKIE is a sequence-to-sequence multimodal generative model that utilizes multimodal encoders to embed visual, layout and textual features and a decoder to generate the desired output. Well-designed prompts are leveraged to incorporate the label semantics as the weakly supervised signals and entice the generation of the key information. One notable advantage of the generative model is that it enables automatic correction of OCR errors. Besides, token-level granular annotation is not required. Extensive experiments on multiple public real-world datasets show that GenKIE effectively generalizes over different types of documents and achieves state-of-the-art results. Our experiments also validate the model's robustness against OCR errors, making GenKIE highly applicable in real-world scenarios.

Abstract (translated)

从扫描文档中提取关键信息（KIE）因其在各种领域的应用而受到越来越多的关注。虽然一些最近KIE方法取得了很好的结果，但它们通常基于有监督的模型构建，缺乏处理光学字符识别（OCR）错误的能力，需要进行费力的标记级token级标注。在本文中，我们提出了一个名为GenKIE的新序列到序列多模态生成模型，以解决KIE任务。GenKIE是一种序列到序列多模态生成模型，利用多模态编码器嵌入视觉、布局和文本特征，并使用解码器生成所需输出。通过设计合适的提示，充分利用标签语义作为弱监督信号，激发关键信息的生成。生成模型的一个显著优点是，它能够自动纠正OCR错误。此外，不需要进行标记级token级标注。在多个公开的实世界数据集上的广泛实验表明，GenKIE在各种类型的文档上有效扩展，并实现了最先进的性能。我们的实验还验证了模型的鲁棒性 against OCR错误，使GenKIE在现实场景中具有很高的适用性。

URL

https://arxiv.org/abs/2310.16131

PDF

https://arxiv.org/pdf/2310.16131.pdf

GenKIE: Robust Generative Multimodal Document Key Information Extraction

Abstract

Abstract (translated)

URL

PDF Copy

PDF