Paper Reading AI Learner

MorphText: Deep Morphology Regularized Arbitrary-shape Scene Text Detection

2024-04-26 04:49:42
Chengpei Xu, Wenjing Jia, Ruomei Wang, Xiaonan Luo, Xiangjian He

Abstract

Bottom-up text detection methods play an important role in arbitrary-shape scene text detection but there are two restrictions preventing them from achieving their great potential, i.e., 1) the accumulation of false text segment detections, which affects subsequent processing, and 2) the difficulty of building reliable connections between text segments. Targeting these two problems, we propose a novel approach, named ``MorphText", to capture the regularity of texts by embedding deep morphology for arbitrary-shape text detection. Towards this end, two deep morphological modules are designed to regularize text segments and determine the linkage between them. First, a Deep Morphological Opening (DMOP) module is constructed to remove false text segment detections generated in the feature extraction process. Then, a Deep Morphological Closing (DMCL) module is proposed to allow text instances of various shapes to stretch their morphology along their most significant orientation while deriving their connections. Extensive experiments conducted on four challenging benchmark datasets (CTW1500, Total-Text, MSRA-TD500 and ICDAR2017) demonstrate that our proposed MorphText outperforms both top-down and bottom-up state-of-the-art arbitrary-shape scene text detection approaches.

Abstract (translated)

自下而上的文本检测方法在任意形状场景文本检测中扮演着重要的角色,但它们无法完全实现其巨大潜力,这是因为1)积累假文本分割检测,影响了后续处理,2)文本段之间的可靠连接难度。为了解决这两个问题,我们提出了一个名为“MorphText”的新方法,通过将深度形态学嵌入任意形状文本检测中,捕捉文本的规律。为实现这一目标,我们设计了两项功能强大的 deep morphological 模块来对文本段进行规范化和确定它们之间的联系。首先,我们构建了一个 Deep Morphological Opening (DMOP) 模块,用于消除在特征提取过程中产生的假文本分割检测。然后,我们提出了一个 Deep Morphological Closing (DMCL) 模块,允许各种形状的文本实例在其最显著的方向上伸展形态学,同时确定它们之间的联系。在四个具有挑战性的基准数据集(CTW1500,Total-Text,MSRA-TD500 和 ICDAR2017)上的大量实验证明,与自上而下和自下而上的状态最先进的任意形状场景文本检测方法相比,我们提出的 MorphText 具有优越的性能。

URL

https://arxiv.org/abs/2404.17151

PDF

https://arxiv.org/pdf/2404.17151.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot