Abstract
Inspired by the recent successes of deep learning on Computer Vision and Natural Language Processing, we present a deep learning approach for recognizing scanned receipts. The recognition system has two main modules: text detection based on Connectionist Text Proposal Network and text recognition based on Attention-based Encoder-Decoder. We also proposed pre-processing to extract receipt area and OCR verification to ignore handwriting. The experiments on the dataset of the Robust Reading Challenge on Scanned Receipts OCR and Information Extraction 2019 demonstrate that the accuracies were improved by integrating the pre-processing and the OCR verification. Our recognition system achieved 71.9% of the F1 score for detection and recognition task.
Abstract (translated)
受最近在计算机视觉和自然语言处理方面深入学习的成功启发,我们提出了一种识别扫描收据的深入学习方法。识别系统主要有两个模块:基于连接主义文本提议网络的文本检测和基于注意的编码器-解码器的文本识别。我们还提出了提取收据区域的预处理和忽略手写的OCR验证。对扫描收据OCR和2019年信息提取的强大阅读挑战数据集进行的实验表明,通过集成预处理和OCR验证,精度得到了提高。我们的识别系统在检测和识别任务中达到了F1分数的71.9%。
URL
https://arxiv.org/abs/1905.12817