Paper Reading AI Learner

Progressive Evolution from Single-Point to Polygon for Scene Text

2023-12-21 12:08:27
Linger Deng, Mingxin Huang, Xudong Xie, Yuliang Liu, Lianwen Jin, Xiang Bai

Abstract

The advancement of text shape representations towards compactness has enhanced text detection and spotting performance, but at a high annotation cost. Current models use single-point annotations to reduce costs, yet they lack sufficient localization information for downstream applications. To overcome this limitation, we introduce Point2Polygon, which can efficiently transform single-points into compact polygons. Our method uses a coarse-to-fine process, starting with creating and selecting anchor points based on recognition confidence, then vertically and horizontally refining the polygon using recognition information to optimize its shape. We demonstrate the accuracy of the generated polygons through extensive experiments: 1) By creating polygons from ground truth points, we achieved an accuracy of 82.0% on ICDAR 2015; 2) In training detectors with polygons generated by our method, we attained 86% of the accuracy relative to training with ground truth (GT); 3) Additionally, the proposed Point2Polygon can be seamlessly integrated to empower single-point spotters to generate polygons. This integration led to an impressive 82.5% accuracy for the generated polygons. It is worth mentioning that our method relies solely on synthetic recognition information, eliminating the need for any manual annotation beyond single points.

Abstract (translated)

文本形状表示的进步使得文本检测和斑点检测性能得到了提高,但需要高昂的标注成本。当前的模型使用单点标注来降低成本,然而它们缺乏足够的关键位置信息,对于下游应用来说至关重要。为了克服这个限制,我们引入了点2面体,它可以通过有效地将单点转换为紧凑的多边形来提高文本检测和斑点检测的性能。我们的方法采用粗到细的过程,首先根据识别信心创建和选择锚点,然后使用识别信息垂直和水平优化多边形的形状。我们通过广泛的实验来证明生成的多边形的准确性:1)通过从真实点创建多边形,我们在2015年ICDAR上实现了82.0%的准确度;2)在用我们方法生成的检测器上进行训练时,我们实现了与用真实点进行训练的86%的准确度相对;3)此外,点2面体可以轻松地与其他单点检测器集成,使其生成多边形。这种集成导致生成的多边形具有令人印象深刻的82.5%的准确度。值得注意的是,我们的方法仅依赖于合成识别信息,消除了对任何手动标注的需求,从而实现单点检测器。

URL

https://arxiv.org/abs/2312.13778

PDF

https://arxiv.org/pdf/2312.13778.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot