Abstract
Scene text recognition is an important and challenging task in computer vision. However, most prior works focus on recognizing pre-defined words, while there are various out-of-vocabulary (OOV) words in real-world applications. In this paper, we propose a novel open-vocabulary text recognition framework, Pseudo-OCR, to recognize OOV words. The key challenge in this task is the lack of OOV training data. To solve this problem, we first propose a pseudo label generation module that leverages character detection and image inpainting to produce substantial pseudo OOV training data from real-world images. Unlike previous synthetic data, our pseudo OOV data contains real characters and backgrounds to simulate real-world applications. Secondly, to reduce noises in pseudo data, we present a semantic checking mechanism to filter semantically meaningful data. Thirdly, we introduce a quality-aware margin loss to boost the training with pseudo data. Our loss includes a margin-based part to enhance the classification ability, and a quality-aware part to penalize low-quality samples in both real and pseudo data. Extensive experiments demonstrate that our approach outperforms the state-of-the-art on eight datasets and achieves the first rank in the ICDAR2022 challenge.
Abstract (translated)
场景文本识别是计算机视觉中一个重要而具有挑战性的任务。然而,大多数先前的作品都专注于识别预定义的单词,而在现实应用中存在各种不在词汇表中的(OOV)单词。在本文中,我们提出了一个新颖的开放词汇文本识别框架,称为伪-OCR,以识别OOV单词。这个任务的关键挑战是缺乏OOV训练数据。为解决这个问题,我们首先提出了一个伪标签生成模块,利用字符检测和图像修复技术从现实世界的图像中产生大量伪OOV训练数据。与之前的合成数据不同,我们的伪OOV数据包含真实字符和背景,以模拟真实世界的应用。其次,为了减少伪数据中的噪声,我们提出了一个语义检查机制来过滤语义上有意义的数据。第三,我们引入了质量感知边距损失来提高带有伪数据的训练。我们的损失包括基于边距的质量和基于质量的损失。大量实验证明,我们的方法在八个数据集上的表现超过了现有技术的水平,在ICDAR2022挑战中获得了第一名的成绩。
URL
https://arxiv.org/abs/2403.07518