Paper Reading AI Learner

TextSnake: A Flexible Representation for Detecting Text of Arbitrary Shapes

2018-07-04 12:37:07
Shangbang Long, Jiaqiang Ruan, Wenjie Zhang, Xin He, Wenhao Wu, Cong Yao

Abstract

Driven by deep neural networks and large scale datasets, scene text detection methods have progressed substantially over the past years, continuously refreshing the performance records on various standard benchmarks. However, limited by the representations (axis-aligned rectangles, rotated rectangles or quadrangles) adopted to describe text, existing methods may fall short when dealing with much more free-form text instances, such as curved text, which are actually very common in real-world scenarios. To tackle this problem, we propose a more flexible representation for scene text, termed as TextSnake, which is able to effectively represent text instances in horizontal, oriented and curved forms. In TextSnake, a text instance is described as a sequence of ordered, overlapping disks centered at symmetric axes, each of which is associated with potentially variable radius and orientation. Such geometry attributes are estimated via a Fully Convolutional Network (FCN) model. In experiments, the text detector based on TextSnake achieves state-of-the-art or comparable performance on Total-Text and SCUT-CTW1500, the two newly published benchmarks with special emphasis on curved text in natural images, as well as the widely-used datasets ICDAR 2015 and MSRA-TD500. Specifically, TextSnake outperforms the baseline on Total-Text by more than 40% in F-measure.

Abstract (translated)

在深度神经网络和大规模数据集的推动下,场景文本检测方法在过去几年中取得了长足进步,不断刷新各种标准基准的性能记录。但是,受到用于描述文本的表示(轴对齐矩形,旋转矩形或四边形)的限制,现有方法在处理更多自由格式文本实例时可能会失败,例如弯曲文本,这在实际中非常常见。 - 世界场景。为了解决这个问题,我们提出了一种更灵活的场景文本表示,称为TextSnake,它能够有效地表示水平,定向和弯曲形式的文本实例。在TextSnake中,文本实例被描述为以对称轴为中心的有序重叠磁盘序列,每个磁盘与可能变化的半径和方向相关联。通过完全卷积网络(FCN)模型估计这种几何属性。在实验中,基于TextSnake的文本检测器在Total-Text和SCUT-CTW1500上实现了最先进或相当的性能,这是两个新发布的基准,特别强调自然图像中的弯曲文本,以及广泛的使用的数据集ICDAR 2015和MSRA-TD500。具体来说,TextSnake在F-measure中的总文本基线优于40%。

URL

https://arxiv.org/abs/1807.01544

PDF

https://arxiv.org/pdf/1807.01544.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot