Abstract
Scene text detection methods based on neural networks have emerged recently and have shown promising results. Previous methods trained with rigid word-level bounding boxes exhibit limitations in representing the text region in an arbitrary shape. In this paper, we propose a new scene text detection method to effectively detect text area by exploring each character and affinity between characters. To overcome the lack of individual character level annotations, our proposed framework exploits both the given character-level annotations for synthetic images and the estimated character-level ground-truths for real images acquired by the learned interim model. In order to estimate affinity between characters, the network is trained with the newly proposed representation for affinity. Extensive experiments on six benchmarks, including the TotalText and CTW-1500 datasets which contain highly curved texts in natural images, demonstrate that our character-level text detection significantly outperforms the state-of-the-art detectors. According to the results, our proposed method guarantees high flexibility in detecting complicated scene text images, such as arbitrarily-oriented, curved, or deformed texts.
Abstract (translated)
基于神经网络的场景文本检测方法近年来出现,并取得了良好的效果。以前用严格的词级边界框训练的方法在以任意形状表示文本区域时存在局限性。本文提出了一种新的场景文本检测方法,通过对每个字符和字符之间的关联性的探索,有效地检测文本区域。为了克服单个字符级注释的不足,我们提出的框架既利用了合成图像的给定字符级注释,也利用了所学过渡模型获取的真实图像的估计字符级基本事实。为了估计字符间的相似性,采用新提出的相似性表示对网络进行训练。在六个基准上进行了广泛的实验,包括自然图像中包含高度弯曲文本的totaltext和ctw-1500数据集,证明我们的字符级文本检测明显优于最先进的检测器。结果表明,本文提出的方法在检测任意方向、弯曲或变形文本等复杂场景文本图像时具有很高的灵活性。
URL
https://arxiv.org/abs/1904.01941