Paper Reading AI Learner

IndicSTR12: A Dataset for Indic Scene Text Recognition

2024-03-12 18:14:48
Harsh Lunia, Ajoy Mondal, C V Jawahar

Abstract

The importance of Scene Text Recognition (STR) in today's increasingly digital world cannot be overstated. Given the significance of STR, data intensive deep learning approaches that auto-learn feature mappings have primarily driven the development of STR solutions. Several benchmark datasets and substantial work on deep learning models are available for Latin languages to meet this need. On more complex, syntactically and semantically, Indian languages spoken and read by 1.3 billion people, there is less work and datasets available. This paper aims to address the Indian space's lack of a comprehensive dataset by proposing the largest and most comprehensive real dataset - IndicSTR12 - and benchmarking STR performance on 12 major Indian languages. A few works have addressed the same issue, but to the best of our knowledge, they focused on a small number of Indian languages. The size and complexity of the proposed dataset are comparable to those of existing Latin contemporaries, while its multilingualism will catalyse the development of robust text detection and recognition models. It was created specifically for a group of related languages with different scripts. The dataset contains over 27000 word-images gathered from various natural scenes, with over 1000 word-images for each language. Unlike previous datasets, the images cover a broader range of realistic conditions, including blur, illumination changes, occlusion, non-iconic texts, low resolution, perspective text etc. Along with the new dataset, we provide a high-performing baseline on three models - PARSeq, CRNN, and STARNet.

Abstract (translated)

在当今日益数字化的世界中,场景文本识别(STR)的重要性不容忽视。考虑到STR的重要性,数据密集型深度学习方法自动学习特征映射已经在STR解决方案的发展中发挥了主要作用。目前有多个用于拉丁语的数据集和大量关于深度学习模型的研究,以满足这一需求。对于说和读13亿人口的印度语言来说,在更复杂、语义和语法方面,可用的工作和数据集较少。本文旨在通过提出最大的、最全面的真实数据集-IndicSTR12,来解决印度空间缺乏全面数据集的问题,并评估STR在12种主要印度语言上的性能。虽然已经有一些研究解决了同样的问题,但据我们所知,它们主要针对少数印度语言。拟议的数据集的大小和复杂性与现有拉丁语作品相当,而其多语言性将催生强大的文本检测和识别模型的开发。它特意为几种相关语言的一个群体而创建。数据集包含来自各种自然场景的超过27000个单词图像,每个语言都有超过1000个单词图像。与以前的数据集不同,图像涵盖了更广泛的现实情况,包括模糊、光照变化、遮挡、非典型文本、低分辨率、透视文本等。与新数据集一起,我们为PARSeq、CRNN和STARNet提供了高性能的基准。

URL

https://arxiv.org/abs/2403.08007

PDF

https://arxiv.org/pdf/2403.08007.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot