Abstract
The absence of large scale datasets with pixel-level supervisions is a significant obstacle for the training of deep convolutional networks for scene text segmentation. For this reason, synthetic data generation is normally employed to enlarge the training dataset. Nonetheless, synthetic data cannot reproduce the complexity and variability of natural images. In this paper, a weakly supervised learning approach is used to reduce the shift between training on real and synthetic data. Pixel-level supervisions for a text detection dataset (i.e. where only bounding-box annotations are available) are generated. In particular, the COCO-Text-Segmentation (COCO_TS) dataset, which provides pixel-level supervisions for the COCO-Text dataset, is created and released. The generated annotations are used to train a deep convolutional neural network for semantic segmentation. Experiments show that the proposed dataset can be used instead of synthetic data, allowing us to use only a fraction of the training samples and significantly improving the performances.
Abstract (translated)
缺乏大规模的像素级监控数据集是深卷积网络场景文本分割训练的重要障碍。因此,通常采用合成数据生成来扩大训练数据集。然而,合成数据不能重现自然图像的复杂性和可变性。本文采用弱监督学习方法来减少实数和合成数据训练之间的转换。生成文本检测数据集(即只有边界框注释可用的地方)的像素级监控。特别是,COCO文本分割(COCO-TS)数据集(它为COCO文本数据集提供像素级监控)被创建和发布。生成的注释用于训练深度卷积神经网络进行语义分割。实验表明,所提出的数据集可以代替合成数据,使我们只使用训练样本的一小部分,并显著提高了性能。
URL
https://arxiv.org/abs/1904.00818