Abstract
The performance of optical character recognition (OCR) heavily relies on document image quality, which is crucial for automatic document processing and document intelligence. However, most existing document enhancement methods require supervised data pairs, which raises concerns about data separation and privacy protection, and makes it challenging to adapt these methods to new domain pairs. To address these issues, we propose DECDM, an end-to-end document-level image translation method inspired by recent advances in diffusion models. Our method overcomes the limitations of paired training by independently training the source (noisy input) and target (clean output) models, making it possible to apply domain-specific diffusion models to other pairs. DECDM trains on one dataset at a time, eliminating the need to scan both datasets concurrently, and effectively preserving data privacy from the source or target domain. We also introduce simple data augmentation strategies to improve character-glyph conservation during translation. We compare DECDM with state-of-the-art methods on multiple synthetic data and benchmark datasets, such as document denoising and {\color{black}shadow} removal, and demonstrate the superiority of performance quantitatively and qualitatively.
Abstract (translated)
OCR性能的优劣很大程度上取决于文档图像的质量,这对自动文档处理和文档智能至关重要。然而,现有的文档增强方法通常需要监督数据对,这引发了数据分离和隐私保护的问题,使得将 these 方法适应新的领域对具有挑战性。为了应对这些问题,我们提出了 DECDM,一种基于近年来扩散模型进展的端到端文档级别图像转换方法。我们的方法通过独立训练源(带噪音的输入)和目标(干净输出)模型,克服了配对训练的局限性,使得可以在其他配对上应用领域特定的扩散模型。DECDM逐一站在一个数据集中训练,消除了同时扫描两个数据集的需要,有效保护了源或目标域中的数据隐私。我们还引入了简单的数据增强策略,以改善翻译过程中字符 glyph 的保留。我们在多个合成数据和基准数据集上与最先进的方法进行了比较,如文档去噪和 {\color{black}阴影} 消除,并展示了性能的定量定性比较和定性比较。
URL
https://arxiv.org/abs/2311.09625