Improving Scene Text Image Super-Resolution via Dual Prior Modulation Network

Abstract
Abstract (translated)
URL
PDF

Abstract

Scene text image super-resolution (STISR) aims to simultaneously increase the resolution and legibility of the text images, and the resulting images will significantly affect the performance of downstream tasks. Although numerous progress has been made, existing approaches raise two crucial issues: (1) They neglect the global structure of the text, which bounds the semantic determinism of the scene text. (2) The priors, e.g., text prior or stroke prior, employed in existing works, are extracted from pre-trained text recognizers. That said, such priors suffer from the domain gap including low resolution and blurriness caused by poor imaging conditions, leading to incorrect guidance. Our work addresses these gaps and proposes a plug-and-play module dubbed Dual Prior Modulation Network (DPMN), which leverages dual image-level priors to bring performance gain over existing approaches. Specifically, two types of prior-guided refinement modules, each using the text mask or graphic recognition result of the low-quality SR image from the preceding layer, are designed to improve the structural clarity and semantic accuracy of the text, respectively. The following attention mechanism hence modulates two quality-enhanced images to attain a superior SR result. Extensive experiments validate that our method improves the image quality and boosts the performance of downstream tasks over five typical approaches on the benchmark. Substantial visualizations and ablation studies demonstrate the advantages of the proposed DPMN. Code is available at: this https URL.

Abstract (translated)

场景文本图像超分辨率(STISR)旨在同时提高文本图像的分辨率和可读性,结果图像将显著影响后续任务的表现。尽管已经取得了许多进展,但现有方法提出了两个关键问题:(1)它们忽视了文本的全局结构,这限制了场景文本的语义决定性。(2)在现有作品中使用的 priors,如文本前或肌肉前,是从训练的文本分类器中提取的。然而,这些 priors 受到领域差距的影响,包括低分辨率和模糊性由不良图像条件引起,导致错误的指导。我们的工作解决了这些差距并提出了名为“双重前缀调制网络”(DPMN)的可插拔模块,它利用双重图像前缀来提高与现有方法的性能比。具体来说,有两种类型的 prior-引导的改进模块,每个使用从前面层低质量SR图像的文本掩码或图形识别结果来改善文本的结构清晰性和语义准确性。因此,后面的注意机制因此 modulate 了两个质量增强的图像以获得更好的SR结果。广泛的实验证实,我们的方法提高了图像质量和在基准基准上提高后续任务表现胜过了五种典型的方法。有大量的可视化和烧穿研究证明了所提出的DPMN的优势。代码已可用,在此 https URL 上。

URL

https://arxiv.org/abs/2302.10414

PDF

https://arxiv.org/pdf/2302.10414.pdf