Inverse-like Antagonistic Scene Text Spotting via Reading-Order Estimation and Dynamic Sampling

Abstract
Abstract (translated)
URL
PDF

Abstract

Scene text spotting is a challenging task, especially for inverse-like scene text, which has complex layouts, e.g., mirrored, symmetrical, or retro-flexed. In this paper, we propose a unified end-to-end trainable inverse-like antagonistic text spotting framework dubbed IATS, which can effectively spot inverse-like scene texts without sacrificing general ones. Specifically, we propose an innovative reading-order estimation module (REM) that extracts reading-order information from the initial text boundary generated by an initial boundary module (IBM). To optimize and train REM, we propose a joint reading-order estimation loss consisting of a classification loss, an orthogonality loss, and a distribution loss. With the help of IBM, we can divide the initial text boundary into two symmetric control points and iteratively refine the new text boundary using a lightweight boundary refinement module (BRM) for adapting to various shapes and scales. To alleviate the incompatibility between text detection and recognition, we propose a dynamic sampling module (DSM) with a thin-plate spline that can dynamically sample appropriate features for recognition in the detected text region. Without extra supervision, the DSM can proactively learn to sample appropriate features for text recognition through the gradient returned by the recognition module. Extensive experiments on both challenging scene text and inverse-like scene text datasets demonstrate that our method achieves superior performance both on irregular and inverse-like text spotting.

Abstract (translated)

场景文本检测是一项具有挑战性的任务，尤其是在反向型场景文本中，这些文本具有复杂的布局，例如镜像、对称或反曲。在本文中，我们提出了一种名为IATS的统一端到端训练的逆向型对抗性文本检测框架，可以有效地检测反向型场景文本，同时不牺牲通用文本。具体来说，我们提出了一种创新性的阅读顺序估计模块（REM），它从初始文本边界生成的初始边界模块（IBM）中提取阅读顺序信息。为了优化和训练REM，我们提出了一种联合阅读顺序估计损失，包括分类损失、正交性损失和分布损失。借助IBM，我们可以将初始文本边界划分为两个对称的控制点，并使用轻量级的边界修复模块（BRM）进行迭代，以适应各种形状和比例。为了减轻文本检测和识别之间的不兼容性，我们提出了一种动态采样模块（DSM），它采用双曲线进行动态采样，以在检测到的文本区域内动态地采样适当的特征进行识别。没有额外的监督，DSM可以通过识别模块返回的梯度主动学习适当的特征进行文本识别。在具有挑战性的场景文本和反向型场景文本数据集上进行的大量实验证明，我们的方法在非规则和反向型文本检测方面都取得了卓越的性能。

URL

https://arxiv.org/abs/2401.03637

PDF

https://arxiv.org/pdf/2401.03637.pdf

Inverse-like Antagonistic Scene Text Spotting via Reading-Order Estimation and Dynamic Sampling

Abstract

Abstract (translated)

URL

PDF Copy

PDF