Abstract
Existing methods for scene text detection can be divided into two paradigms: segmentation-based and anchor-based. While Segmentation-based methods are well-suited for irregular shapes, they struggle with compact or overlapping layouts. Conversely, anchor-based approaches excel for complex layouts but suffer from irregular shapes. To strengthen their merits and overcome their respective demerits, we propose a Complementary Proposal Network (CPN) that seamlessly and parallelly integrates semantic and geometric information for superior performance. The CPN comprises two efficient networks for proposal generation: the Deformable Morphology Semantic Network, which generates semantic proposals employing an innovative deformable morphological operator, and the Balanced Region Proposal Network, which produces geometric proposals with pre-defined anchors. To further enhance the complementarity, we introduce an Interleaved Feature Attention module that enables semantic and geometric features to interact deeply before proposal generation. By leveraging both complementary proposals and features, CPN outperforms state-of-the-art approaches with significant margins under comparable computation cost. Specifically, our approach achieves improvements of 3.6%, 1.3% and 1.0% on challenging benchmarks ICDAR19-ArT, IC15, and MSRA-TD500, respectively. Code for our method will be released.
Abstract (translated)
现有的场景文本检测方法可以分为两种范式:基于分割和基于锚定。虽然基于分割的方法对于不规则形状的应用效果很好,但它们在紧凑或重叠布局下表现不佳。相反,基于锚定的方法在复杂布局下表现出色,但存在不规则形状的问题。为了增强其优势并克服各自的缺陷,我们提出了一个互补建议网络(CPN),它平滑地并行地整合语义和几何信息以实现卓越的性能。CPN包括两个用于提议生成的有效网络:具有创新变形形态操作的语义变形形态网络和具有预定义锚定的平衡区域提议网络。为了进一步增强互补性,我们还引入了一个跨特征关注模块,使得语义和几何特征在提议生成前进行深度交互。通过利用互补提议和特征,CPN在类似计算成本下显著优于最先进的 approaches。具体来说,我们的方法在具有挑战性的基准测试ICDAR19-ArT、IC15和MSRA-TD500上分别实现了3.6%、1.3%和1.0%的改进。我们的方法将发布代码。
URL
https://arxiv.org/abs/2402.11540