Abstract
Scene text detection has witnessed rapid progress especially with the recent development of convolutional neural networks. However, there still exists two challenges which prevent the algorithm into industry applications. On the one hand, most of the state-of-art algorithms require quadrangle bounding box which is in-accurate to locate the texts with arbitrary shape. On the other hand, two text instances which are close to each other may lead to a false detection which covers both instances. Traditionally, the segmentation-based approach can relieve the first problem but usually fail to solve the second challenge. To address these two challenges, in this paper, we propose a novel Progressive Scale Expansion Network (PSENet), which can precisely detect text instances with arbitrary shapes. More specifically, PSENet generates the different scale of kernels for each text instance, and gradually expands the minimal scale kernel to the text instance with the complete shape. Due to the fact that there are large geometrical margins among the minimal scale kernels, our method is effective to split the close text instances, making it easier to use segmentation-based methods to detect arbitrary-shaped text instances. Extensive experiments on CTW1500, Total-Text, ICDAR 2015 and ICDAR 2017 MLT validate the effectiveness of PSENet. Notably, on CTW1500, a dataset full of long curve texts, PSENet achieves a F-measure of 74.3% at 27 FPS, and our best F-measure (82.2%) outperforms state-of-art algorithms by 6.6%. The code will be released in the future.
Abstract (translated)
尤其是近年来卷积神经网络的发展,使得场景文本检测得到了迅速的发展。然而,该算法在工业应用中仍然存在两个难题。一方面,大多数最先进的算法都需要四边形的边界框来精确定位任意形状的文本;另一方面,两个彼此接近的文本实例可能导致覆盖这两个实例的错误检测。传统上,基于分割的方法可以缓解第一个问题,但通常无法解决第二个挑战。为了解决这两个难题,本文提出了一种新的渐进式规模扩展网络(PSENET),它可以精确地检测任意形状的文本实例。更具体地说,psenet为每个文本实例生成不同大小的内核,并逐渐将最小大小的内核扩展为具有完整形状的文本实例。由于最小尺度核之间存在较大的几何边界,该方法能够有效地分割封闭文本实例,使基于分割的方法更容易检测任意形状的文本实例。对CTW1500、全文、ICDAR 2015和ICDAR 2017 MLT进行了大量实验,验证了PSENET的有效性。值得注意的是,在CTW1500上,一个充满长曲线文本的数据集,PSENET在27 fps时达到了74.3%的F度量,我们最好的F度量(82.2%)比最先进的算法高出6.6%。代码将在将来发布。
URL
https://arxiv.org/abs/1903.12473