Abstract
When one wants to train a neural network to perform semantic segmentation, creating pixel-level annotations for each of the images in the database is a tedious task. If he works with aerial or satellite images, which are usually very large, it is even worse. With that in mind, we investigate how to use image-level annotations in order to perform semantic segmentation. Image-level annotations are much less expensive to acquire than pixel-level annotations, but we lose a lot of information for the training of the model. From the annotations of the images, the model must find by itself how to classify the different regions of the image. In this work, we use the method proposed by Anh and Kwak [1] to produce pixel-level annotation from image level annotation. We compare the overall quality of our generated dataset with the original dataset. In addition, we propose an adaptation of the AffinityNet that allows us to directly perform a semantic segmentation. Our results show that the generated labels lead to the same performances for the training of several segmentation networks. Also, the quality of semantic segmentation performed directly by the AffinityNet and the Random Walk is close to the one of the best fully-supervised approaches.
Abstract (translated)
当一个人想要训练一个神经网络来执行语义分割时,为数据库中的每一幅图像创建像素级的注释是一项繁琐的任务。如果他使用的是通常非常大的航空或卫星图像,情况会更糟。考虑到这一点,我们研究了如何使用图像级注释来执行语义分割。与像素级注释相比,图像级注释的获取成本要低得多,但是我们会为模型的培训损失很多信息。从图像的注释中,模型必须找到如何对图像的不同区域进行分类的方法。在本研究中,我们使用Anh和Kwak[1]提出的方法,从图像水平注释中生成像素水平注释。我们将生成的数据集的整体质量与原始数据集进行比较。此外,我们还提出了一种仿射的改编,它允许我们直接执行语义分割。结果表明,所生成的标签在多个分割网络的训练中具有相同的性能。同时,直接由仿射和随机游走进行语义分割的质量也接近于最好的完全监督方法之一。
URL
https://arxiv.org/abs/1904.03983