Paper Reading AI Learner

Box-driven Class-wise Region Masking and Filling Rate Guided Loss for Weakly Supervised Semantic Segmentation

2019-04-26 06:47:58
Chunfeng Song, Yan Huang, Wanli Ouyang, Liang Wang

Abstract

Semantic segmentation has achieved huge progress via adopting deep Fully Convolutional Networks (FCN). However, the performance of FCN based models severely rely on the amounts of pixel-level annotations which are expensive and time-consuming. To address this problem, it is a good choice to learn to segment with weak supervision from bounding boxes. How to make full use of the class-level and region-level supervisions from bounding boxes is the critical challenge for the weakly supervised learning task. In this paper, we first introduce a box-driven class-wise masking model (BCM) to remove irrelevant regions of each class. Moreover, based on the pixel-level segment proposal generated from the bounding box supervision, we could calculate the mean filling rates of each class to serve as an important prior cue, then we propose a filling rate guided adaptive loss (FR-Loss) to help the model ignore the wrongly labeled pixels in proposals. Unlike previous methods directly training models with the fixed individual segment proposals, our method can adjust the model learning with global statistical information. Thus it can help reduce the negative impacts from wrongly labeled proposals. We evaluate the proposed method on the challenging PASCAL VOC 2012 benchmark and compare with other methods. Extensive experimental results show that the proposed method is effective and achieves the state-of-the-art results.

Abstract (translated)

语义分割通过采用深度全卷积网络(FCN)取得了巨大的进展。然而,基于FCN的模型的性能严重依赖于像素级注释的数量,这些注释既昂贵又耗时。为了解决这个问题,最好是从边界框中学习如何在弱监督的情况下进行分段。如何充分利用边界盒中的类级和区域级监控,是弱监控学习任务面临的关键挑战。在本文中,我们首先介绍了一个盒子驱动的类屏蔽模型(BCM),用于去除每个类的无关区域。此外,基于边界盒监督生成的像素级分段建议,我们可以计算每个类的平均填充率作为一个重要的先验线索,然后我们提出填充率引导的自适应损失(fr loss),以帮助模型忽略建议中错误标记的像素。与以前的方法不同,我们的方法直接用固定的单个段建议训练模型,我们的方法可以用全局统计信息调整模型学习。因此,它有助于减少错误标注提案的负面影响。我们在具有挑战性的帕斯卡VOC 2012基准上评估了所提出的方法,并与其他方法进行了比较。大量的实验结果表明,该方法是有效的,达到了最新的结果。

URL

https://arxiv.org/abs/1904.11693

PDF

https://arxiv.org/pdf/1904.11693.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot