Abstract
Table detection within document images is a crucial task in document processing, involving the identification and localization of tables. Recent strides in deep learning have substantially improved the accuracy of this task, but it still heavily relies on large labeled datasets for effective training. Several semi-supervised approaches have emerged to overcome this challenge, often employing CNN-based detectors with anchor proposals and post-processing techniques like non-maximal suppression (NMS). However, recent advancements in the field have shifted the focus towards transformer-based techniques, eliminating the need for NMS and emphasizing object queries and attention mechanisms. Previous research has focused on two key areas to improve transformer-based detectors: refining the quality of object queries and optimizing attention mechanisms. However, increasing object queries can introduce redundancy, while adjustments to the attention mechanism can increase complexity. To address these challenges, we introduce a semi-supervised approach employing SAM-DETR, a novel approach for precise alignment between object queries and target features. Our approach demonstrates remarkable reductions in false positives and substantial enhancements in table detection performance, particularly in complex documents characterized by diverse table structures. This work provides more efficient and accurate table detection in semi-supervised settings.
Abstract (translated)
在文档图像中的表格检测是一个关键的任务,涉及表格的识别和定位。尽管最近在深度学习领域的进步大大提高了这一任务的准确性,但仍然高度依赖大型带标签数据集进行有效的训练。为克服这一挑战,已经出现了几種半监督方法,通常采用基于卷积神经网络(CNN)的检测器以及非最大抑制(NMS)等后处理技术。然而,该领域的最新进展已经将重点转向基于Transformer的技术,消除了NMS的需要,并强调了对象查询和注意机制。之前的研究集中在两个关键领域以提高基于Transformer的检测器的质量:优化对象查询和优化注意机制。然而,增加对象查询可能会引入冗余,而调整注意机制可能会增加复杂性。为了应对这些挑战,我们引入了一种半监督方法,使用了SAM-DETR,一种用于精确将对象查询与目标特征对齐的新颖方法。我们的方法在减少误检率和提高表格检测性能方面取得了显著的降幅,特别是在具有多样表格结构的复杂文档中。这项工作在半监督环境中提供了更高效和准确的表格检测。
URL
https://arxiv.org/abs/2405.00187