Paper Reading AI Learner

Shape Robust Text Detection with Progressive Scale Expansion Network

2019-03-28 06:04:44
Wenhai Wang, Enze Xie, Xiang Li, Wenbo Hou, Tong Lu, Gang Yu, Shuai Shao

Abstract

Scene text detection has witnessed rapid progress especially with the recent development of convolutional neural networks. However, there still exists two challenges which prevent the algorithm into industry applications. On the one hand, most of the state-of-art algorithms require quadrangle bounding box which is in-accurate to locate the texts with arbitrary shape. On the other hand, two text instances which are close to each other may lead to a false detection which covers both instances. Traditionally, the segmentation-based approach can relieve the first problem but usually fail to solve the second challenge. To address these two challenges, in this paper, we propose a novel Progressive Scale Expansion Network (PSENet), which can precisely detect text instances with arbitrary shapes. More specifically, PSENet generates the different scale of kernels for each text instance, and gradually expands the minimal scale kernel to the text instance with the complete shape. Due to the fact that there are large geometrical margins among the minimal scale kernels, our method is effective to split the close text instances, making it easier to use segmentation-based methods to detect arbitrary-shaped text instances. Extensive experiments on CTW1500, Total-Text, ICDAR 2015 and ICDAR 2017 MLT validate the effectiveness of PSENet. Notably, on CTW1500, a dataset full of long curve texts, PSENet achieves a F-measure of 74.3% at 27 FPS, and our best F-measure (82.2%) outperforms state-of-art algorithms by 6.6%. The code will be released in the future.

Abstract (translated)

尤其是近年来卷积神经网络的发展,使得场景文本检测得到了迅速的发展。然而,该算法在工业应用中仍然存在两个难题。一方面,大多数最先进的算法都需要四边形的边界框来精确定位任意形状的文本;另一方面,两个彼此接近的文本实例可能导致覆盖这两个实例的错误检测。传统上,基于分割的方法可以缓解第一个问题,但通常无法解决第二个挑战。为了解决这两个难题,本文提出了一种新的渐进式规模扩展网络(PSENET),它可以精确地检测任意形状的文本实例。更具体地说,psenet为每个文本实例生成不同大小的内核,并逐渐将最小大小的内核扩展为具有完整形状的文本实例。由于最小尺度核之间存在较大的几何边界,该方法能够有效地分割封闭文本实例,使基于分割的方法更容易检测任意形状的文本实例。对CTW1500、全文、ICDAR 2015和ICDAR 2017 MLT进行了大量实验,验证了PSENET的有效性。值得注意的是,在CTW1500上,一个充满长曲线文本的数据集,PSENET在27 fps时达到了74.3%的F度量,我们最好的F度量(82.2%)比最先进的算法高出6.6%。代码将在将来发布。

URL

https://arxiv.org/abs/1903.12473

PDF

https://arxiv.org/pdf/1903.12473.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot