Field typing for improved recognition on heterogeneous handwritten forms

2019-09-23 01:29:58

Ciprian Tomoiaga (1), Paul Feng (1), Mathieu Salzmann (2), Patrick Jayet (1) ((1) AXA REV Lausanne, (2) CVLab EPFL Switzerland)

arXiv_CV

Abstract
Abstract (translated)
URL
PDF

Abstract

Offline handwriting recognition has undergone continuous progress over the past decades. However, existing methods are typically benchmarked on free-form text datasets that are biased towards good-quality images and handwriting styles, and homogeneous content. In this paper, we show that state-of-the-art algorithms, employing long short-term memory (LSTM) layers, do not readily generalize to real-world structured documents, such as forms, due to their highly heterogeneous and out-of-vocabulary content, and to the inherent ambiguities of this content. To address this, we propose to leverage the content type within an LSTM-based architecture. Furthermore, we introduce a procedure to generate synthetic data to train this architecture without requiring expensive manual annotations. We demonstrate the effectiveness of our approach at transcribing text on a challenging, real-world dataset of European Accident Statements.

Abstract (translated)

URL

https://arxiv.org/abs/1909.10120

PDF

https://arxiv.org/pdf/1909.10120.pdf