Paper Reading AI Learner

Multilingual Scene Character Recognition System using Sparse Auto-Encoder for Efficient Local Features Representation in Bag of Features

2018-06-11 11:21:42
Maroua Tounsi, Ikram Moalla, Frank Lebourgeois, Adel M. Alimi

Abstract

The recognition of texts existing in camera-captured images has become an important issue for a great deal of research during the past few decades. This give birth to Scene Character Recognition (SCR) which is an important step in scene text recognition pipeline. In this paper, we extended the Bag of Features (BoF)-based model using sparse stacked auto-encoder technique for representing local features for accurate SCR of different languages. In the features coding step, a Sparse Auto-encoder (SAE)-based strategy was applied to enhance the representative and discriminative abilities of image features. This technique provides more efficient features representation and therefore a better recognition accuracy, compared to other feature coding techniques. This coding step was followed by a Spatial Pyramid Matching (SPM) and max-pooling to keep the spatial information and form the global image signature. Our system was evaluated extensively on six scene character datasets of five different languages. The experimental results proved the efficiency of our system for a multilingual SCR.

Abstract (translated)

相机捕捉图像中存在的文本的识别已成为过去几十年来大量研究的重要问题。这产生了场景字符识别(SCR),这是场景文本识别流水线中的一个重要步骤。在本文中,我们使用稀疏堆叠自动编码器技术扩展了基于Bag特征(BoF)的模型,用于表示不同语言的精确SCR的局部特征。在特征编码步骤中,采用基于稀疏自动编码器(SAE)的策略来增强图像特征的代表性和区分能力。与其他特征编码技术相比,该技术提供更高效的特征表示并因此具有更好的识别准确性。这个编码步骤之后是空间金字塔匹配(SPM)和最大池化,以保持空间信息并形成全局图像签名。我们的系统在五种不同语言的六个场景字符数据集上进行了广泛的评估。实验结果证明了我们的多语言SCR系统的效率。

URL

https://arxiv.org/abs/1806.07374

PDF

https://arxiv.org/pdf/1806.07374.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot