Abstract
Scene-text spotting is a task that predicts a text area on natural scene images and recognizes its text characters simultaneously. It has attracted much attention in recent years due to its wide applications. Existing research has mainly focused on improving text region detection, not text recognition. Thus, while detection accuracy is improved, the end-to-end accuracy is insufficient. Texts in natural scene images tend to not be a random string of characters but a meaningful string of characters, a word. Therefore, we propose adversarial learning of semantic representations for scene text spotting (A3S) to improve end-to-end accuracy, including text recognition. A3S simultaneously predicts semantic features in the detected text area instead of only performing text recognition based on existing visual features. Experimental results on publicly available datasets show that the proposed method achieves better accuracy than other methods.
Abstract (translated)
场景文本发现(Scene-text spotting)是一项任务,该任务在自然场景图像中预测文本区域并同时识别其文本字符。近年来,该任务因其广泛的应用而吸引了大量关注。现有的研究主要关注提高文本区域检测的准确性,而不是文本识别的准确性。因此,尽管检测准确性得到了提高,但端到端的准确性仍然不足。自然场景图像中的文本通常不是一段随机的字符序列,而是一段有意义的字符序列,也就是一句话。因此,我们提出了场景文本发现语义表示的学习(A3S),以提高端到端的准确性,包括文本识别。A3S同时预测在检测到文本区域的语义特征,而不是仅基于现有的视觉特征进行文本识别。公开可用数据集的实验结果表明,该方法比其他方法获得更好的准确性。
URL
https://arxiv.org/abs/2302.10641