Paper Reading AI Learner

Inverse-like Antagonistic Scene Text Spotting via Reading-Order Estimation and Dynamic Sampling

2024-01-08 02:47:47
Shi-Xue Zhang, Chun Yang, Xiaobin Zhu, Hongyang Zhou, Hongfa Wang, Xu-Cheng Yin

Abstract

Scene text spotting is a challenging task, especially for inverse-like scene text, which has complex layouts, e.g., mirrored, symmetrical, or retro-flexed. In this paper, we propose a unified end-to-end trainable inverse-like antagonistic text spotting framework dubbed IATS, which can effectively spot inverse-like scene texts without sacrificing general ones. Specifically, we propose an innovative reading-order estimation module (REM) that extracts reading-order information from the initial text boundary generated by an initial boundary module (IBM). To optimize and train REM, we propose a joint reading-order estimation loss consisting of a classification loss, an orthogonality loss, and a distribution loss. With the help of IBM, we can divide the initial text boundary into two symmetric control points and iteratively refine the new text boundary using a lightweight boundary refinement module (BRM) for adapting to various shapes and scales. To alleviate the incompatibility between text detection and recognition, we propose a dynamic sampling module (DSM) with a thin-plate spline that can dynamically sample appropriate features for recognition in the detected text region. Without extra supervision, the DSM can proactively learn to sample appropriate features for text recognition through the gradient returned by the recognition module. Extensive experiments on both challenging scene text and inverse-like scene text datasets demonstrate that our method achieves superior performance both on irregular and inverse-like text spotting.

Abstract (translated)

场景文本检测是一项具有挑战性的任务,尤其是在反向型场景文本中,这些文本具有复杂的布局,例如镜像、对称或反曲。在本文中,我们提出了一种名为IATS的统一端到端训练的逆向型对抗性文本检测框架,可以有效地检测反向型场景文本,同时不牺牲通用文本。具体来说,我们提出了一种创新性的阅读顺序估计模块(REM),它从初始文本边界生成的初始边界模块(IBM)中提取阅读顺序信息。为了优化和训练REM,我们提出了一种联合阅读顺序估计损失,包括分类损失、正交性损失和分布损失。借助IBM,我们可以将初始文本边界划分为两个对称的控制点,并使用轻量级的边界修复模块(BRM)进行迭代,以适应各种形状和比例。为了减轻文本检测和识别之间的不兼容性,我们提出了一种动态采样模块(DSM),它采用双曲线进行动态采样,以在检测到的文本区域内动态地采样适当的特征进行识别。没有额外的监督,DSM可以通过识别模块返回的梯度主动学习适当的特征进行文本识别。在具有挑战性的场景文本和反向型场景文本数据集上进行的大量实验证明,我们的方法在非规则和反向型文本检测方面都取得了卓越的性能。

URL

https://arxiv.org/abs/2401.03637

PDF

https://arxiv.org/pdf/2401.03637.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot