Look More Than Once: An Accurate Detector for Text of Arbitrary Shapes

Abstract
Abstract (translated)
URL
PDF

Abstract

Previous scene text detection methods have progressed substantially over the past years. However, limited by the receptive field of CNNs and the simple representations like rectangle bounding box or quadrangle adopted to describe text, previous methods may fall short when dealing with more challenging text instances, such as extremely long text and arbitrarily shaped text. To address these two problems, we present a novel text detector namely LOMO, which localizes the text progressively for multiple times (or in other word, LOok More than Once). LOMO consists of a direct regressor (DR), an iterative refinement module (IRM) and a shape expression module (SEM). At first, text proposals in the form of quadrangle are generated by DR branch. Next, IRM progressively perceives the entire long text by iterative refinement based on the extracted feature blocks of preliminary proposals. Finally, a SEM is introduced to reconstruct more precise representation of irregular text by considering the geometry properties of text instance, including text region, text center line and border offsets. The state-of-the-art results on several public benchmarks including ICDAR2017-RCTW, SCUT-CTW1500, Total-Text, ICDAR2015 and ICDAR17-MLT confirm the striking robustness and effectiveness of LOMO.

Abstract (translated)

在过去的几年中，以前的场景文本检测方法有了很大的进步。但是，由于受CNN的接收域和描述文本所采用的矩形边界框或四边形等简单表示形式的限制，在处理更具挑战性的文本实例（如超长文本和任意形状的文本）时，以前的方法可能存在不足。为了解决这两个问题，我们提出了一种新的文本检测器lomo，它将文本逐步本地化多次（或者换句话说，查找多次）。LOMO由直接回归器（DR）、迭代细化模块（IRM）和形状表达模块（SEM）组成。首先，本文提出的形式是四合院的DR分支。其次，通过基于初步建议的提取特征块的迭代细化，IRM逐步感知整个长文本。最后，结合文本实例的几何性质，包括文本区域、文本中心线和边框偏移量，引入扫描电镜来重建不规则文本的更精确表示。几个公共基准的最新成果，包括ICDAR2017-RCTW、SCUT-CTW1500、TOTAL TEXT、ICDAR2015和ICDAR17-MLT，证实了LOMO的显著稳健性和有效性。

URL

https://arxiv.org/abs/1904.06535

PDF

https://arxiv.org/pdf/1904.06535.pdf