Paper Reading AI Learner

Text Image Inpainting via Global Structure-Guided Diffusion Models

2024-01-26 13:01:28
Shipeng Zhu, Pengfei Fang, Chenjie Zhu, Zuoyan Zhao, Qiang Xu, Hui Xue

Abstract

Real-world text can be damaged by corrosion issues caused by environmental or human factors, which hinder the preservation of the complete styles of texts, e.g., texture and structure. These corrosion issues, such as graffiti signs and incomplete signatures, bring difficulties in understanding the texts, thereby posing significant challenges to downstream applications, e.g., scene text recognition and signature identification. Notably, current inpainting techniques often fail to adequately address this problem and have difficulties restoring accurate text images along with reasonable and consistent styles. Formulating this as an open problem of text image inpainting, this paper aims to build a benchmark to facilitate its study. In doing so, we establish two specific text inpainting datasets which contain scene text images and handwritten text images, respectively. Each of them includes images revamped by real-life and synthetic datasets, featuring pairs of original images, corrupted images, and other assistant information. On top of the datasets, we further develop a novel neural framework, Global Structure-guided Diffusion Model (GSDM), as a potential solution. Leveraging the global structure of the text as a prior, the proposed GSDM develops an efficient diffusion model to recover clean texts. The efficacy of our approach is demonstrated by thorough empirical study, including a substantial boost in both recognition accuracy and image quality. These findings not only highlight the effectiveness of our method but also underscore its potential to enhance the broader field of text image understanding and processing. Code and datasets are available at: this https URL.

Abstract (translated)

现实世界的文本可能会因为环境或人类因素引起的腐蚀问题而受到损害,这会阻碍文本的完整风格,如纹理和结构。这些腐蚀问题,如涂鸦标志和未完成签名,在理解文本方面带来困难,从而对下游应用,如场景文本识别和签名识别构成了重大挑战。值得注意的是,当前修复技术往往未能充分解决这个问题,并且在恢复准确文本图像和合理且一致的风格方面存在困难。将这个问题定性为文本图像修复的一个开放问题,本文旨在建立一个基准来促进其研究。通过建立包含场景文本图像和手写文本图像的两个具体的文本图像修复数据集,我们分别利用现实和合成数据集对图像进行修复,包括原始图像、污染图像和其他辅助信息。在数据集之上,我们进一步发展了一个新颖的神经框架——全局结构引导扩散模型(GSDM),作为潜在解决方案。利用文本全局结构的先验知识,GSDM 开发了一个有效的扩散模型来恢复清洁的文本。本文方法的有效性通过详细的实验研究得到了证实,包括识别准确度和图像质量的大幅提升。这些发现不仅突出了我们方法的有效性,而且强调了其在文本图像理解和处理领域的潜在提高。代码和数据集可在此链接处获取:https:// this URL。

URL

https://arxiv.org/abs/2401.14832

PDF

https://arxiv.org/pdf/2401.14832.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot