TNCR: Table Net Detection and Classification Dataset

2021-06-19 10:48:58

Abdelrahman Abdallah, Alexander Berendeyev, Islam Nuradin, Daniyar Nurseitov

arXiv_AI

Abstract
Abstract (translated)
URL
PDF

Abstract

We present TNCR, a new table dataset with varying image quality collected from free websites. The TNCR dataset can be used for table detection in scanned document images and their classification into 5 different classes. TNCR contains 9428 high-quality labeled images. In this paper, we have implemented state-of-the-art deep learning-based methods for table detection to create several strong baselines. Cascade Mask R-CNN with ResNeXt-101-64x4d Backbone Network achieves the highest performance compared to other methods with a precision of 79.7%, recall of 89.8%, and f1 score of 84.4% on the TNCR dataset. We have made TNCR open source in the hope of encouraging more deep learning approaches to table detection, classification, and structure recognition. The dataset and trained model checkpoints are available at this https URL.

Abstract (translated)

URL

https://arxiv.org/abs/2106.15322

PDF

https://arxiv.org/pdf/2106.15322.pdf