Paper Reading AI Learner

Look More Than Once: An Accurate Detector for Text of Arbitrary Shapes

2019-04-13 12:50:24
Chengquan Zhang, Borong Liang, Zuming Huang, Mengyi En, Junyu Han, Errui Ding, Xinghao Ding

Abstract

Previous scene text detection methods have progressed substantially over the past years. However, limited by the receptive field of CNNs and the simple representations like rectangle bounding box or quadrangle adopted to describe text, previous methods may fall short when dealing with more challenging text instances, such as extremely long text and arbitrarily shaped text. To address these two problems, we present a novel text detector namely LOMO, which localizes the text progressively for multiple times (or in other word, LOok More than Once). LOMO consists of a direct regressor (DR), an iterative refinement module (IRM) and a shape expression module (SEM). At first, text proposals in the form of quadrangle are generated by DR branch. Next, IRM progressively perceives the entire long text by iterative refinement based on the extracted feature blocks of preliminary proposals. Finally, a SEM is introduced to reconstruct more precise representation of irregular text by considering the geometry properties of text instance, including text region, text center line and border offsets. The state-of-the-art results on several public benchmarks including ICDAR2017-RCTW, SCUT-CTW1500, Total-Text, ICDAR2015 and ICDAR17-MLT confirm the striking robustness and effectiveness of LOMO.

Abstract (translated)

在过去的几年中,以前的场景文本检测方法有了很大的进步。但是,由于受CNN的接收域和描述文本所采用的矩形边界框或四边形等简单表示形式的限制,在处理更具挑战性的文本实例(如超长文本和任意形状的文本)时,以前的方法可能存在不足。为了解决这两个问题,我们提出了一种新的文本检测器lomo,它将文本逐步本地化多次(或者换句话说,查找多次)。LOMO由直接回归器(DR)、迭代细化模块(IRM)和形状表达模块(SEM)组成。首先,本文提出的形式是四合院的DR分支。其次,通过基于初步建议的提取特征块的迭代细化,IRM逐步感知整个长文本。最后,结合文本实例的几何性质,包括文本区域、文本中心线和边框偏移量,引入扫描电镜来重建不规则文本的更精确表示。几个公共基准的最新成果,包括ICDAR2017-RCTW、SCUT-CTW1500、TOTAL TEXT、ICDAR2015和ICDAR17-MLT,证实了LOMO的显著稳健性和有效性。

URL

https://arxiv.org/abs/1904.06535

PDF

https://arxiv.org/pdf/1904.06535.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot