Abstract
The laws of model size, data volume, computation and model performance have been extensively studied in the field of Natural Language Processing (NLP). However, the scaling laws in Optical Character Recognition (OCR) have not yet been investigated. To address this, we conducted comprehensive studies that involved examining the correlation between performance and the scale of models, data volume and computation in the field of text recognition.Conclusively, the study demonstrates smooth power laws between performance and model size, as well as training data volume, when other influencing factors are held constant. Additionally, we have constructed a large-scale dataset called REBU-Syn, which comprises 6 million real samples and 18 million synthetic samples. Based on our scaling law and new dataset, we have successfully trained a scene text recognition model, achieving a new state-ofthe-art on 6 common test benchmarks with a top-1 average accuracy of 97.42%.
Abstract (translated)
自然语言处理(NLP)领域已经对模型的规模、数据量和计算性能的定律进行了广泛研究。然而,光学字符识别(OCR)中的缩放定律尚未被研究。为解决这个问题,我们进行了全面的研究,涉及了文本识别领域中模型性能、数据量和计算与性能之间的相关性。 总之,我们的研究证明了性能与模型大小之间的平滑功率定律,以及训练数据量和计算之间的平滑功率定律。此外,我们还构建了一个名为REBU-Syn的大规模数据集,包括600万真实样本和1800万合成样本。基于我们的缩放定律和新数据集,我们成功训练了一个场景文本识别模型,在6个常见测试基准上的 top-1 平均准确率达到了97.42%。
URL
https://arxiv.org/abs/2401.00028