Abstract
The automatic recognition of tabular data in document images presents a significant challenge due to the diverse range of table styles and complex structures. Tables offer valuable content representation, enhancing the predictive capabilities of various systems such as search engines and Knowledge Graphs. Addressing the two main problems, namely table detection (TD) and table structure recognition (TSR), has traditionally been approached independently. In this research, we propose an end-to-end pipeline that integrates deep learning models, including DETR, CascadeTabNet, and PP OCR v2, to achieve comprehensive image-based table recognition. This integrated approach effectively handles diverse table styles, complex structures, and image distortions, resulting in improved accuracy and efficiency compared to existing methods like Table Transformers. Our system achieves simultaneous table detection (TD), table structure recognition (TSR), and table content recognition (TCR), preserving table structures and accurately extracting tabular data from document images. The integration of multiple models addresses the intricacies of table recognition, making our approach a promising solution for image-based table understanding, data extraction, and information retrieval applications. Our proposed approach achieves an IOU of 0.96 and an OCR Accuracy of 78%, showcasing a remarkable improvement of approximately 25% in the OCR Accuracy compared to the previous Table Transformer approach.
Abstract (translated)
表格数据在文档图像中的自动识别是一个具有重大挑战性的任务,因为表格样式和复杂结构具有多样性。表格提供了有价值的内容表示,增强了搜索引擎和知识图谱等系统预测能力。传统上,解决表格检测(TD)和表格结构识别(TSR)问题通常是独立处理。在本次研究中,我们提出了一种端到端的管道,整合了包括DETR、CascadeTabNet和PP OCR v2在内的人工智能模型,以实现全面的基于图像的表格识别。这种集成方法有效地处理了多样性的表格样式、复杂结构和图像畸变,使得与现有方法(如Table Transformers)相比,准确性和效率都得到了提高。我们的系统同时实现了表格检测(TD)、表格结构识别(TSR)和表格内容识别(TCR),保留了表格结构并准确从文档图像中提取表格数据。整合多个模型解决了表格识别的复杂性,使我们的方法成为图像为基础的表格理解、数据提取和信息检索应用的的有前景的解决方案。与前Table Transformer方法相比,我们提出的 approach 的 OCR Accuracy 提高了近25%。
URL
https://arxiv.org/abs/2404.10305