Open-Vocabulary Scene Text Recognition via Pseudo-Image Labeling and Margin Loss

Abstract
Abstract (translated)
URL
PDF

Abstract

Scene text recognition is an important and challenging task in computer vision. However, most prior works focus on recognizing pre-defined words, while there are various out-of-vocabulary (OOV) words in real-world applications. In this paper, we propose a novel open-vocabulary text recognition framework, Pseudo-OCR, to recognize OOV words. The key challenge in this task is the lack of OOV training data. To solve this problem, we first propose a pseudo label generation module that leverages character detection and image inpainting to produce substantial pseudo OOV training data from real-world images. Unlike previous synthetic data, our pseudo OOV data contains real characters and backgrounds to simulate real-world applications. Secondly, to reduce noises in pseudo data, we present a semantic checking mechanism to filter semantically meaningful data. Thirdly, we introduce a quality-aware margin loss to boost the training with pseudo data. Our loss includes a margin-based part to enhance the classification ability, and a quality-aware part to penalize low-quality samples in both real and pseudo data. Extensive experiments demonstrate that our approach outperforms the state-of-the-art on eight datasets and achieves the first rank in the ICDAR2022 challenge.

Abstract (translated)

场景文本识别是计算机视觉中一个重要而具有挑战性的任务。然而，大多数先前的作品都专注于识别预定义的单词，而在现实应用中存在各种不在词汇表中的（OOV）单词。在本文中，我们提出了一个新颖的开放词汇文本识别框架，称为伪-OCR，以识别OOV单词。这个任务的关键挑战是缺乏OOV训练数据。为解决这个问题，我们首先提出了一个伪标签生成模块，利用字符检测和图像修复技术从现实世界的图像中产生大量伪OOV训练数据。与之前的合成数据不同，我们的伪OOV数据包含真实字符和背景，以模拟真实世界的应用。其次，为了减少伪数据中的噪声，我们提出了一个语义检查机制来过滤语义上有意义的数据。第三，我们引入了质量感知边距损失来提高带有伪数据的训练。我们的损失包括基于边距的质量和基于质量的损失。大量实验证明，我们的方法在八个数据集上的表现超过了现有技术的水平，在ICDAR2022挑战中获得了第一名的成绩。

URL

https://arxiv.org/abs/2403.07518

PDF

https://arxiv.org/pdf/2403.07518.pdf

Open-Vocabulary Scene Text Recognition via Pseudo-Image Labeling and Margin Loss

Abstract

Abstract (translated)

URL

PDF Copy

PDF