Abstract
Recent advancements in deep neural networks have markedly enhanced the performance of computer vision tasks, yet the specialized nature of these networks often necessitates extensive data and high computational power. Addressing these requirements, this study presents a novel neural network model adept at optical character recognition (OCR) across diverse domains, leveraging the strengths of multi-task learning to improve efficiency and generalization. The model is designed to achieve rapid adaptation to new domains, maintain a compact size conducive to reduced computational resource demand, ensure high accuracy, retain knowledge from previous learning experiences, and allow for domain-specific performance improvements without the need to retrain entirely. Rigorous evaluation on open datasets has validated the model's ability to significantly lower the number of trainable parameters without sacrificing performance, indicating its potential as a scalable and adaptable solution in the field of computer vision, particularly for applications in optical text recognition.
Abstract (translated)
近年来,在深度神经网络方面取得了显著的进展,极大地增强了计算机视觉任务的性能。然而,这些网络的专用性质往往需要大量数据和高计算资源。为满足这些要求,本研究提出了一种名为光学字符识别(OCR)的多领域神经网络新模型,利用多任务学习的优势来提高效率和泛化。该模型旨在实现对不同领域的快速适应,保持小型化以降低计算资源需求,确保高精度,保留之前学习经验的知識,并且不需要重新训练整个模型就可以实现领域特异性性能的改善。在公开数据集上进行严格的评估证实了该模型在保持性能的同时显著降低了训练参数的数量,表明其在计算机视觉领域具有可扩展性和适应性,尤其是在光学文本识别应用中。
URL
https://arxiv.org/abs/2401.00971