Paper Reading AI Learner

UniMERNet: A Universal Network for Real-World Mathematical Expression Recognition

2024-04-23 17:39:27
Bin Wang, Zhuangcheng Gu, Chao Xu, Bo Zhang, Botian Shi, Conghui He

Abstract

This paper presents the UniMER dataset to provide the first study on Mathematical Expression Recognition (MER) towards complex real-world scenarios. The UniMER dataset consists of a large-scale training set UniMER-1M offering an unprecedented scale and diversity with one million training instances and a meticulously designed test set UniMER-Test that reflects a diverse range of formula distributions prevalent in real-world scenarios. Therefore, the UniMER dataset enables the training of a robust and high-accuracy MER model and comprehensive evaluation of model performance. Moreover, we introduce the Universal Mathematical Expression Recognition Network (UniMERNet), an innovative framework designed to enhance MER in practical scenarios. UniMERNet incorporates a Length-Aware Module to process formulas of varied lengths efficiently, thereby enabling the model to handle complex mathematical expressions with greater accuracy. In addition, UniMERNet employs our UniMER-1M data and image augmentation techniques to improve the model's robustness under different noise conditions. Our extensive experiments demonstrate that UniMERNet outperforms existing MER models, setting a new benchmark in various scenarios and ensuring superior recognition quality in real-world applications. The dataset and model are available at this https URL.

Abstract (translated)

本文介绍了UniMER数据集,以提供数学表达识别(MER)在复杂现实场景中的第一研究。UniMER数据集包括一个大规模训练集UniMER-1M,提供前所未有的规模和多样性,以及一个精心设计的测试集UniMER-Test,反映了现实场景中普遍存在的公式分布。因此,UniMER数据集使得训练具有稳健和高精度的MER模型,全面评估模型性能成为可能。此外,我们引入了通用数学表达识别网络(UniMERNet),一种旨在增强MER在实际场景中的框架。UniMERNet包括一个长度感知模块,以处理不同长度的公式,从而使模型能够更准确地处理复杂数学表达。此外,UniMERNet利用我们的UniMER-1M数据和图像增强技术,在不同噪声条件下提高模型的稳健性。我们广泛的实验证明,UniMERNet在各种场景中优于现有MER模型,为各种应用场景树立了新的基准,并确保在现实应用中具有卓越的识别质量。数据集和模型可通过此链接获取:https://url.cn/xyz6h

URL

https://arxiv.org/abs/2404.15254

PDF

https://arxiv.org/pdf/2404.15254.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot