Paper Reading AI Learner

Weakly Supervised Semantic Segmentation of Satellite Images

2019-04-08 12:09:11
Adrien Nivaggioli, Hicham Randrianarivo (CEDRIC)

Abstract

When one wants to train a neural network to perform semantic segmentation, creating pixel-level annotations for each of the images in the database is a tedious task. If he works with aerial or satellite images, which are usually very large, it is even worse. With that in mind, we investigate how to use image-level annotations in order to perform semantic segmentation. Image-level annotations are much less expensive to acquire than pixel-level annotations, but we lose a lot of information for the training of the model. From the annotations of the images, the model must find by itself how to classify the different regions of the image. In this work, we use the method proposed by Anh and Kwak [1] to produce pixel-level annotation from image level annotation. We compare the overall quality of our generated dataset with the original dataset. In addition, we propose an adaptation of the AffinityNet that allows us to directly perform a semantic segmentation. Our results show that the generated labels lead to the same performances for the training of several segmentation networks. Also, the quality of semantic segmentation performed directly by the AffinityNet and the Random Walk is close to the one of the best fully-supervised approaches.

Abstract (translated)

当一个人想要训练一个神经网络来执行语义分割时,为数据库中的每一幅图像创建像素级的注释是一项繁琐的任务。如果他使用的是通常非常大的航空或卫星图像,情况会更糟。考虑到这一点,我们研究了如何使用图像级注释来执行语义分割。与像素级注释相比,图像级注释的获取成本要低得多,但是我们会为模型的培训损失很多信息。从图像的注释中,模型必须找到如何对图像的不同区域进行分类的方法。在本研究中,我们使用Anh和Kwak[1]提出的方法,从图像水平注释中生成像素水平注释。我们将生成的数据集的整体质量与原始数据集进行比较。此外,我们还提出了一种仿射的改编,它允许我们直接执行语义分割。结果表明,所生成的标签在多个分割网络的训练中具有相同的性能。同时,直接由仿射和随机游走进行语义分割的质量也接近于最好的完全监督方法之一。

URL

https://arxiv.org/abs/1904.03983

PDF

https://arxiv.org/pdf/1904.03983.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot