Abstract
Super-resolution reconstruction is aimed at generating images of high spatial resolution from low-resolution observations. State-of-the-art super-resolution techniques underpinned with deep learning allow for obtaining results of outstanding visual quality, but it is seldom verified whether they constitute a valuable source for specific computer vision applications. In this paper, we investigate the possibility of employing super-resolution as a preprocessing step to improve optical character recognition from document scans. To achieve that, we propose to train deep networks for single-image super-resolution in a task-driven way to make them better adapted for the purpose of text detection. As problems limited to a specific task are heavily ill-posed, we introduce a multi-task loss function that embraces components related with text detection coupled with those guided by image similarity. The obtained results reported in this paper are encouraging and they constitute an important step towards real-world super-resolution of document images.
Abstract (translated)
超分辨率重建旨在从低分辨率观测中获得高空间分辨率的图像。支撑深度学习的最先进超分辨率技术允许获得出色的视觉效果,但很少验证它们是否具有针对特定计算机视觉应用程序的有价值的来源。在本文中,我们研究了使用超分辨率作为预处理步骤来改善文档扫描中的光学字符识别的可能性。为了实现这一目标,我们提出了以任务驱动的方式训练深度网络以实现单图像超分辨率,使它们更适合用于文本检测。由于针对特定任务的 problems 非常 ill-posed,我们引入了一个多任务损失函数,该函数包括与文本检测相关的组件以及由图像相似性引导的组件。本文报告的结果鼓舞人心,并构成了实现真实世界文档图像超分辨率的重要一步。
URL
https://arxiv.org/abs/2407.08993