Paper Reading AI Learner

The Character Error Vector: Decomposable errors for page-level OCR evaluation

2026-04-07 17:56:06
Jonathan Bourne, Mwiza Simbeye, Joseph Nockels

Abstract

The Character Error Rate (CER) is a key metric for evaluating the quality of Optical Character Recognition (OCR). However, this metric assumes that text has been perfectly parsed, which is often not the case. Under page-parsing errors, CER becomes undefined, limiting its use as a metric and making evaluating page-level OCR challenging, particularly when using data that do not share a labelling schema. We introduce the Character Error Vector (CEV), a bag-of-characters evaluator for OCR. The CEV can be decomposed into parsing and OCR, and interaction error components. This decomposability allows practitioners to focus on the part of the Document Understanding pipeline that will have the greatest impact on overall text extraction quality. The CEV can be implemented using a variety of methods, of which we demonstrate SpACER (Spatially Aware Character Error Rate) and a Character distribution method using the Jensen-Shannon Distance. We validate the CEV's performance against other metrics: first, the relationship with CER; then, parse quality; and finally, as a direct measure of page-level OCR quality. The validation process shows that the CEV is a valuable bridge between parsing metrics and local metrics like CER. We analyse a dataset of archival newspapers made of degraded images with complex layouts and find that state-of-the-art end-to-end models are outperformed by more traditional pipeline approaches. Whilst the CEV requires character-level positioning for optimal triage, thresholding on easily available values can predict the main error source with an F1 of 0.91. We provide the CEV as part of a Python library to support Document understanding research.

Abstract (translated)

字符错误率(CER)是评估光学字符识别(OCR)质量的关键指标。然而,该指标假设文本已被完美解析,而实际情况往往并非如此。在存在页面解析错误时,CER变得无法定义,这限制了其作为评估指标的应用,并使得对页面级OCR的评估变得困难,尤其是在使用标注模式不一致的数据时。我们提出了字符错误向量(CEV),一种用于OCR的字符袋评估器。CEV可分解为解析错误、OCR错误及交互误差三部分。这种可分解性使从业者能够聚焦于文档理解流水线中对整体文本提取质量影响最大的环节。CEV可通过多种方法实现,其中我们展示了空间感知字符错误率(SpACER)以及基于Jensen-Shannon距离的字符分布方法。我们通过以下方式验证CEV相对于其他指标的性能:首先,与CER的关系;其次,解析质量;最后,作为页面级OCR质量的直接度量。验证过程表明,CEV在解析指标与CER等局部指标之间建立了有价值的桥梁。我们分析了一个由布局复杂、图像退化的档案报纸组成的数据集,发现最先进的端到端模型表现不如更传统的流水线方法。尽管CEV需要字符级定位以实现最佳分类,但基于易获取的数值进行阈值设定即可以0.91的F1值预测主要误差来源。我们已将CEV作为Python库的一部分提供,以支持文档理解研究。

URL

https://arxiv.org/abs/2604.06160

PDF

https://arxiv.org/pdf/2604.06160.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot