Paper Reading AI Learner

Improving Scene Text Image Super-Resolution via Dual Prior Modulation Network

2023-02-21 02:59:37
Shipeng Zhu, Zuoyan Zhao, Pengfei Fang, Hui Xue

Abstract

Scene text image super-resolution (STISR) aims to simultaneously increase the resolution and legibility of the text images, and the resulting images will significantly affect the performance of downstream tasks. Although numerous progress has been made, existing approaches raise two crucial issues: (1) They neglect the global structure of the text, which bounds the semantic determinism of the scene text. (2) The priors, e.g., text prior or stroke prior, employed in existing works, are extracted from pre-trained text recognizers. That said, such priors suffer from the domain gap including low resolution and blurriness caused by poor imaging conditions, leading to incorrect guidance. Our work addresses these gaps and proposes a plug-and-play module dubbed Dual Prior Modulation Network (DPMN), which leverages dual image-level priors to bring performance gain over existing approaches. Specifically, two types of prior-guided refinement modules, each using the text mask or graphic recognition result of the low-quality SR image from the preceding layer, are designed to improve the structural clarity and semantic accuracy of the text, respectively. The following attention mechanism hence modulates two quality-enhanced images to attain a superior SR result. Extensive experiments validate that our method improves the image quality and boosts the performance of downstream tasks over five typical approaches on the benchmark. Substantial visualizations and ablation studies demonstrate the advantages of the proposed DPMN. Code is available at: this https URL.

Abstract (translated)

场景文本图像超分辨率(STISR)旨在同时提高文本图像的分辨率和可读性,结果图像将显著影响后续任务的表现。尽管已经取得了许多进展,但现有方法提出了两个关键问题:(1)它们忽视了文本的全局结构,这限制了场景文本的语义决定性。(2)在现有作品中使用的 priors,如文本前或肌肉前,是从训练的文本分类器中提取的。然而,这些 priors 受到领域差距的影响,包括低分辨率和模糊性由不良图像条件引起,导致错误的指导。我们的工作解决了这些差距并提出了名为“双重前缀调制网络”(DPMN)的可插拔模块,它利用双重图像前缀来提高与现有方法的性能比。具体来说,有两种类型的 prior-引导的改进模块,每个使用从前面层低质量SR图像的文本掩码或图形识别结果来改善文本的结构清晰性和语义准确性。因此,后面的注意机制因此 modulate 了两个质量增强的图像以获得更好的SR结果。广泛的实验证实,我们的方法提高了图像质量和在基准基准上提高后续任务表现胜过了五种典型的方法。有大量的可视化和烧穿研究证明了所提出的DPMN的优势。代码已可用,在此 https URL 上。

URL

https://arxiv.org/abs/2302.10414

PDF

https://arxiv.org/pdf/2302.10414.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot