Paper Reading AI Learner

Offline Extraction of Indic Regional Language from Natural Scene Image using Text Segmentation and Deep Convolutional Sequence

2018-06-16 08:31:06
Sauradip Nag, Pallab Kumar Ganguly, Sumit Roy, Sourab Jha, Krishna Bose, Abhishek Jha, Koushik Dasgupta

Abstract

Regional language extraction from a natural scene image is always a challenging proposition due to its dependence on the text information extracted from Image. Text Extraction on the other hand varies on different lighting condition, arbitrary orientation, inadequate text information, heavy background influence over text and change of text appearance. This paper presents a novel unified method for tackling the above challenges. The proposed work uses an image correction and segmentation technique on the existing Text Detection Pipeline an Efficient and Accurate Scene Text Detector (EAST). EAST uses standard PVAnet architecture to select features and non maximal suppression to detect text from image. Text recognition is done using combined architecture of MaxOut convolution neural network (CNN) and Bidirectional long short term memory (LSTM) network. After recognizing text using the Deep Learning based approach, the native Languages are translated to English and tokenized using standard Text Tokenizers. The tokens that very likely represent a location is used to find the Global Positioning System (GPS) coordinates of the location and subsequently the regional languages spoken in that location is extracted. The proposed method is tested on a self generated dataset collected from Government of India dataset and experimented on Standard Dataset to evaluate the performance of the proposed technique. Comparative study with a few state-of-the-art methods on text detection, recognition and extraction of regional language from images shows that the proposed method outperforms the existing methods.

Abstract (translated)

从自然场景图像提取区域语言总是一个具有挑战性的命题,因为它依赖于从图像中提取的文本信息。另一方面,文本提取根据不同的照明条件,任意方向,不足的文本信息,背景对文本的重大影响以及文本外观的改变而变化。本文提出了一种新的统一方法来应对上述挑战。所提出的工作在现有的文本检测流水线上使用图像校正和分割技术,即高效且准确的场景文本检测器(EAST)。 EAST使用标准PVAnet架构来选择特征和非最大抑制来检测图像中的文本。文本识别使用MaxOut卷积神经网络(CNN)和双向长期短期记忆(LSTM)网络的组合体系结构完成。使用基于深度学习的方法识别文本后,本地语言被翻译成英文,并使用标准文本标记器进行标记。很可能代表位置的令牌用于查找位置的全球定位系统(GPS)坐标,并随后提取在该位置说出的区域语言。所提出的方法在从印度政府数据集收集的自生数据集上进行测试,并在标准数据集上进行实验以评估所提出的技术的性能。用几种最先进的方法进行文本检测,识别和从图像中提取区域语言的比较研究表明,所提出的方法优于现有方法。

URL

https://arxiv.org/abs/1806.06208

PDF

https://arxiv.org/pdf/1806.06208.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot