Paper Reading AI Learner

Multi-Granularity Denoising and Bidirectional Alignment for Weakly Supervised Semantic Segmentation

2023-05-09 03:33:43
Tao Chen, Yazhou Yao, Jinhui Tang

Abstract

Weakly supervised semantic segmentation (WSSS) models relying on class activation maps (CAMs) have achieved desirable performance comparing to the non-CAMs-based counterparts. However, to guarantee WSSS task feasible, we need to generate pseudo labels by expanding the seeds from CAMs which is complex and time-consuming, thus hindering the design of efficient end-to-end (single-stage) WSSS approaches. To tackle the above dilemma, we resort to the off-the-shelf and readily accessible saliency maps for directly obtaining pseudo labels given the image-level class labels. Nevertheless, the salient regions may contain noisy labels and cannot seamlessly fit the target objects, and saliency maps can only be approximated as pseudo labels for simple images containing single-class objects. As such, the achieved segmentation model with these simple images cannot generalize well to the complex images containing multi-class objects. To this end, we propose an end-to-end multi-granularity denoising and bidirectional alignment (MDBA) model, to alleviate the noisy label and multi-class generalization issues. Specifically, we propose the online noise filtering and progressive noise detection modules to tackle image-level and pixel-level noise, respectively. Moreover, a bidirectional alignment mechanism is proposed to reduce the data distribution gap at both input and output space with simple-to-complex image synthesis and complex-to-simple adversarial learning. MDBA can reach the mIoU of 69.5\% and 70.2\% on validation and test sets for the PASCAL VOC 2012 dataset. The source codes and models have been made available at \url{this https URL}.

Abstract (translated)

弱监督语义分割(WSSS)模型依靠类激活图(CAMs)取得了与不使用CAMs-based counterparts的理想表现。然而,为了确保WSSS任务可行,我们需要从CAMs中扩展种子来生成伪标签,这是复杂且费时的,因此阻碍了高效(单阶段)WSSS方法的设计。为了解决上述困境,我们采用现有的、易于获取的可见度图,以直接获得伪标签,根据图像级类标签进行。然而,可见性区域可能包含噪声标签,无法无缝适应目标物体,而可见性图只能近似为伪标签,对于包含单个类物体的简单图像。因此,这些简单图像中的实现模型无法很好地适应包含多个类物体的复杂图像。为了解决这个问题,我们提出了一种端到端多粒度去噪和双向对齐(MDBA)模型,以减轻噪声标签和多类 generalization 问题。具体来说,我们提出了在线噪声过滤和逐步噪声检测模块,分别处理图像级和像素级噪声。此外,我们提出了双向对齐机制,以减少输入和输出空间中的数据分布差距,通过简单的到复杂的图像合成和简单的到复杂的对抗学习实现。MDBA在PASCAL VOC 2012数据集的验证和测试集上可以达到69.5%和70.2%的IoU。源代码和模型已放在\url{this https URL}上。

URL

https://arxiv.org/abs/2305.05154

PDF

https://arxiv.org/pdf/2305.05154.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot