Paper Reading AI Learner

An Intelligent-Detection Network for Handwritten Mathematical Expression Recognition


Abstract

The use of artificial intelligence technology in education is growing rapidly, with increasing attention being paid to handwritten mathematical expression recognition (HMER) by researchers. However, many existing methods for HMER may fail to accurately read formulas with complex structures, as the attention results can be inaccurate due to illegible handwriting or large variations in writing styles. Our proposed Intelligent-Detection Network (IDN) for HMER differs from traditional encoder-decoder methods by utilizing object detection techniques. Specifically, we have developed an enhanced YOLOv7 network that can accurately detect both digital and symbolic objects. The detection results are then integrated into the bidirectional gated recurrent unit (BiGRU) and the baseline symbol relationship tree (BSRT) to determine the relationships between symbols and numbers. The experiments demonstrate that the proposed method outperforms those encoder-decoder networks in recognizing complex handwritten mathematical expressions. This is due to the precise detection of symbols and numbers. Our research has the potential to make valuable contributions to the field of HMER. This could be applied in various practical scenarios, such as assignment grading in schools and information entry of paper documents.

Abstract (translated)

人工智能技术在教育领域的应用迅速增长,研究人员越来越关注手写数学表达识别(HMER)。然而,许多现有方法可能无法准确地识别具有复杂结构的复杂数学公式,因为注意结果可能因不清晰的手写或写作风格的巨大差异而出现误差。我们提出的智能检测网络(IDN) for HMER与传统编码器-解码器方法不同,因为它利用了物体检测技术。具体来说,我们开发了一个增强的 YOLOv7 网络,可以准确检测数字和符号物体。检测结果 then 集成到双向门控循环单元(BiGRU)和基线符号关系树(BSRT)中,以确定符号和数字之间的关系。实验证明,与传统的编码器-解码器网络相比,该方法在识别复杂手写数学表达式方面表现出优异性能。这是由于符号和数字的准确检测。我们的研究有望为 H梅尔领域做出有价值的贡献。这可以在各种实际场景中应用,例如学校中的作业评分和纸质文件的信息录入等。

URL

https://arxiv.org/abs/2311.15273

PDF

https://arxiv.org/pdf/2311.15273.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot