Abstract
Document Image Enhancement (DIE) serves as a critical component in Document AI systems, where its performance substantially determines the effectiveness of downstream tasks. To address the limitations of existing methods confined to single-degradation restoration or grayscale image processing, we present Global with Local Parametric Generation Enhancement Network (GL-PGENet), a novel architecture designed for multi-degraded color document images, ensuring both efficiency and robustness in real-world scenarios. Our solution incorporates three key innovations: First, a hierarchical enhancement framework that integrates global appearance correction with local refinement, enabling coarse-to-fine quality improvement. Second, a Dual-Branch Local-Refine Network with parametric generation mechanisms that replaces conventional direct prediction, producing enhanced outputs through learned intermediate parametric representations rather than pixel-wise mapping. This approach enhances local consistency while improving model generalization. Finally, a modified NestUNet architecture incorporating dense block to effectively fuse low-level pixel features and high-level semantic features, specifically adapted for document image characteristics. In addition, to enhance generalization performance, we adopt a two-stage training strategy: large-scale pretraining on a synthetic dataset of 500,000+ samples followed by task-specific fine-tuning. Extensive experiments demonstrate the superiority of GL-PGENet, achieving state-of-the-art SSIM scores of 0.7721 on DocUNet and 0.9480 on RealDAE. The model also exhibits remarkable cross-domain adaptability and maintains computational efficiency for high-resolution images without performance degradation, confirming its practical utility in real-world scenarios.
Abstract (translated)
文档图像增强(DIE)在文档人工智能系统中扮演着关键角色,其性能对下游任务的有效性有着重大影响。为了克服现有方法仅限于单一退化恢复或灰度图像处理的局限性,我们提出了一种全新的架构——全局与局部参数生成增强网络(GL-PGENet),专门针对多退化的彩色文档图像,并确保在实际应用中的高效性和鲁棒性。 我们的解决方案包括三项关键创新: 1. **分层增强框架**:该框架结合了全球外观校正和局部细化,通过从粗到细的质量改进方法来提升图像质量。 2. **双分支局部精细化网络与参数生成机制**:替代传统直接预测的方法,这种机制采用学习得到的中间参数表示而不是像素级映射来产生增强输出。这种方法提升了局部一致性,并增强了模型的泛化能力。 3. **修改版NestUNet架构**:此架构融合了密集块,有效结合低层次像素特征和高层次语义特征,并特别针对文档图像的特点进行了优化。 此外,为了提高泛化性能,我们采用了一种两阶段训练策略:先在包含50万多个样本的合成数据集上进行大规模预训练,随后根据具体任务进行微调。广泛的实验表明GL-PGENet显著优于现有技术,在DocUNet和RealDAE测试集中分别达到了最先进的SSIM得分0.7721和0.9480。 该模型还展示了出色的跨域适应性,并且在处理高分辨率图像时,仍能保持计算效率而不影响性能表现,这进一步证实了其在实际场景中的实用价值。
URL
https://arxiv.org/abs/2505.22021