Paper Reading AI Learner

TC-OCR: TableCraft OCR for Efficient Detection & Recognition of Table Structure & Content

2024-04-16 06:24:53
Avinash Anand, Raj Jaiswal, Pijush Bhuyan, Mohit Gupta, Siddhesh Bangar, Md. Modassir Imam, Rajiv Ratn Shah, Shin'ichi Satoh

Abstract

The automatic recognition of tabular data in document images presents a significant challenge due to the diverse range of table styles and complex structures. Tables offer valuable content representation, enhancing the predictive capabilities of various systems such as search engines and Knowledge Graphs. Addressing the two main problems, namely table detection (TD) and table structure recognition (TSR), has traditionally been approached independently. In this research, we propose an end-to-end pipeline that integrates deep learning models, including DETR, CascadeTabNet, and PP OCR v2, to achieve comprehensive image-based table recognition. This integrated approach effectively handles diverse table styles, complex structures, and image distortions, resulting in improved accuracy and efficiency compared to existing methods like Table Transformers. Our system achieves simultaneous table detection (TD), table structure recognition (TSR), and table content recognition (TCR), preserving table structures and accurately extracting tabular data from document images. The integration of multiple models addresses the intricacies of table recognition, making our approach a promising solution for image-based table understanding, data extraction, and information retrieval applications. Our proposed approach achieves an IOU of 0.96 and an OCR Accuracy of 78%, showcasing a remarkable improvement of approximately 25% in the OCR Accuracy compared to the previous Table Transformer approach.

Abstract (translated)

表格数据在文档图像中的自动识别是一个具有重大挑战性的任务,因为表格样式和复杂结构具有多样性。表格提供了有价值的内容表示,增强了搜索引擎和知识图谱等系统预测能力。传统上,解决表格检测(TD)和表格结构识别(TSR)问题通常是独立处理。在本次研究中,我们提出了一种端到端的管道,整合了包括DETR、CascadeTabNet和PP OCR v2在内的人工智能模型,以实现全面的基于图像的表格识别。这种集成方法有效地处理了多样性的表格样式、复杂结构和图像畸变,使得与现有方法(如Table Transformers)相比,准确性和效率都得到了提高。我们的系统同时实现了表格检测(TD)、表格结构识别(TSR)和表格内容识别(TCR),保留了表格结构并准确从文档图像中提取表格数据。整合多个模型解决了表格识别的复杂性,使我们的方法成为图像为基础的表格理解、数据提取和信息检索应用的的有前景的解决方案。与前Table Transformer方法相比,我们提出的 approach 的 OCR Accuracy 提高了近25%。

URL

https://arxiv.org/abs/2404.10305

PDF

https://arxiv.org/pdf/2404.10305.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot