Jointly Learning Span Extraction and Sequence Labeling for Information Extraction from Business Documents

2022-05-26 15:37:24

Nguyen Hong Son, Hieu M. Vu, Tuan-Anh D. Nguyen, Minh-Tien Nguyen

arXiv_AI

arXiv_AI Sparse Action

Abstract
Abstract (translated)
URL
PDF

Abstract

This paper introduces a new information extraction model for business documents. Different from prior studies which only base on span extraction or sequence labeling, the model takes into account advantage of both span extraction and sequence labeling. The combination allows the model to deal with long documents with sparse information (the small amount of extracted information). The model is trained end-to-end to jointly optimize the two tasks in a unified manner. Experimental results on four business datasets in English and Japanese show that the model achieves promising results and is significantly faster than the normal span-based extraction method. The code is also available.

Abstract (translated)

URL

https://arxiv.org/abs/2205.13434

PDF

https://arxiv.org/pdf/2205.13434.pdf

Jointly Learning Span Extraction and Sequence Labeling for Information Extraction from Business Documents

Abstract

Abstract (translated)

URL

PDF Copy

PDF