Abstract
Recently, scene text detection has received significant attention due to its wide application. However, accurate detection in complex scenes of multiple scales, orientations, and curvature remains a challenge. Numerous detection methods adopt the Vatti clipping (VC) algorithm for multiple-instance training to address the issue of arbitrary-shaped text. Yet we identify several bias results from these approaches called the "shrinked kernel". Specifically, it refers to a decrease in accuracy resulting from an output that overly favors the text kernel. In this paper, we propose a new approach named Expand Kernel Network (EK-Net) with expand kernel distance to compensate for the previous deficiency, which includes three-stages regression to complete instance detection. Moreover, EK-Net not only realize the precise positioning of arbitrary-shaped text, but also achieve a trade-off between performance and speed. Evaluation results demonstrate that EK-Net achieves state-of-the-art or competitive performance compared to other advanced methods, e.g., F-measure of 85.72% at 35.42 FPS on ICDAR 2015, F-measure of 85.75% at 40.13 FPS on CTW1500.
Abstract (translated)
近年来,场景文本检测因其广泛应用而受到了广泛关注。然而,在复杂场景中准确检测多个规模、方向和曲率的文本仍然具有挑战性。为解决任意形状文本的问题,许多检测方法采用Vatti截剪(VC)算法进行多实例训练。然而,我们从中发现了几个称为“收缩核”的偏差结果。具体来说,它指的是输出过分倾向于文本核导致准确度下降。在本文中,我们提出了一种名为扩展核网络(EK-Net)的新方法,通过扩展核距离来弥补这一缺陷,包括三个阶段的回归以完成实例检测。此外,EK-Net不仅实现了任意形状文本的准确定位,还实现了性能与速度的平衡。评估结果显示,与其它先进方法相比,EK-Net在IICAR 2015上的F1分数达到了85.72%,在CTW1500上的F1分数达到了85.75%。
URL
https://arxiv.org/abs/2401.11704