Paper Reading AI Learner

Weakly Supervised Learning of Instance Segmentation with Inter-pixel Relations

2019-04-10 08:02:35
Jiwoon Ahn, Sunghyun Cho, Suha Kwak

Abstract

This paper presents a novel approach for learning instance segmentation with image-level class labels as supervision. Our approach generates pseudo instance segmentation labels of training images, which are used to train a fully supervised model. For generating the pseudo labels, we first identify confident seed areas of object classes from attention maps of an image classification model, and propagate them to discover the entire instance areas with accurate boundaries. To this end, we propose IRNet, which estimates rough areas of individual instances and detects boundaries between different object classes. It thus enables to assign instance labels to the seeds and to propagate them within the boundaries so that the entire areas of instances can be estimated accurately. Furthermore, IRNet is trained with inter-pixel relations on the attention maps, thus no extra supervision is required. Our method with IRNet achieves an outstanding performance on the PASCAL VOC 2012 dataset, surpassing not only previous state-of-the-art trained with the same level of supervision, but also some of previous models relying on stronger supervision.

Abstract (translated)

提出了一种以图像级类标签为监督的实例分割学习方法。我们的方法生成训练图像的伪实例分割标签,用于训练一个完全监督的模型。为了生成伪标签,我们首先从一个图像分类模型的注意力图中识别出对象类的自信种子区域,然后传播它们以发现具有精确边界的整个实例区域。为此,我们提出了IRNET,它估计单个实例的大致区域,并检测不同对象类之间的边界。因此,它可以将实例标签分配给种子,并在边界内传播它们,以便准确估计实例的整个区域。此外,IRnet在注意力图上接受了像素间关系的训练,因此不需要额外的监督。我们使用IRNET的方法在Pascal VOC 2012数据集上取得了卓越的性能,不仅超过了以前接受过相同监管水平培训的最先进水平,而且也超过了以前依靠更强大监管的一些模型。

URL

https://arxiv.org/abs/1904.05044

PDF

https://arxiv.org/pdf/1904.05044.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot