Abstract
Real-world text can be damaged by corrosion issues caused by environmental or human factors, which hinder the preservation of the complete styles of texts, e.g., texture and structure. These corrosion issues, such as graffiti signs and incomplete signatures, bring difficulties in understanding the texts, thereby posing significant challenges to downstream applications, e.g., scene text recognition and signature identification. Notably, current inpainting techniques often fail to adequately address this problem and have difficulties restoring accurate text images along with reasonable and consistent styles. Formulating this as an open problem of text image inpainting, this paper aims to build a benchmark to facilitate its study. In doing so, we establish two specific text inpainting datasets which contain scene text images and handwritten text images, respectively. Each of them includes images revamped by real-life and synthetic datasets, featuring pairs of original images, corrupted images, and other assistant information. On top of the datasets, we further develop a novel neural framework, Global Structure-guided Diffusion Model (GSDM), as a potential solution. Leveraging the global structure of the text as a prior, the proposed GSDM develops an efficient diffusion model to recover clean texts. The efficacy of our approach is demonstrated by thorough empirical study, including a substantial boost in both recognition accuracy and image quality. These findings not only highlight the effectiveness of our method but also underscore its potential to enhance the broader field of text image understanding and processing. Code and datasets are available at: this https URL.
Abstract (translated)
现实世界的文本可能会因为环境或人类因素引起的腐蚀问题而受到损害,这会阻碍文本的完整风格,如纹理和结构。这些腐蚀问题,如涂鸦标志和未完成签名,在理解文本方面带来困难,从而对下游应用,如场景文本识别和签名识别构成了重大挑战。值得注意的是,当前修复技术往往未能充分解决这个问题,并且在恢复准确文本图像和合理且一致的风格方面存在困难。将这个问题定性为文本图像修复的一个开放问题,本文旨在建立一个基准来促进其研究。通过建立包含场景文本图像和手写文本图像的两个具体的文本图像修复数据集,我们分别利用现实和合成数据集对图像进行修复,包括原始图像、污染图像和其他辅助信息。在数据集之上,我们进一步发展了一个新颖的神经框架——全局结构引导扩散模型(GSDM),作为潜在解决方案。利用文本全局结构的先验知识,GSDM 开发了一个有效的扩散模型来恢复清洁的文本。本文方法的有效性通过详细的实验研究得到了证实,包括识别准确度和图像质量的大幅提升。这些发现不仅突出了我们方法的有效性,而且强调了其在文本图像理解和处理领域的潜在提高。代码和数据集可在此链接处获取:https:// this URL。
URL
https://arxiv.org/abs/2401.14832