Paper Reading AI Learner

Masked Collaborative Contrast for Weakly Supervised Semantic Segmentation

2023-05-15 09:46:28
Fangwen Wu, Jingxuan He, Lechao Cheng, Yufei Yin, Yanbin Hao, Gang Huang

Abstract

This study introduces an efficacious approach, Masked Collaborative Contrast (MCC), to emphasize semantic regions in weakly supervised semantic segmentation. MCC adroitly incorporates concepts from masked image modeling and contrastive learning to devise Transformer blocks that induce keys to contract towards semantically pertinent regions. Unlike prevalent techniques that directly eradicate patch regions in the input image when generating masks, we scrutinize the neighborhood relations of patch tokens by exploring masks considering keys on the affinity matrix. Moreover, we generate positive and negative samples in contrastive learning by utilizing the masked local output and contrasting it with the global output. Elaborate experiments on commonly employed datasets evidences that the proposed MCC mechanism effectively aligns global and local perspectives within the image, attaining impressive performance.

Abstract (translated)

这项研究介绍了一种有效的方法——遮蔽协同比较(MCC),用于在弱监督下语义分割中强调语义区域。MCC巧妙地融合了遮蔽图像建模和对比学习的概念,设计出了Transformer blocks,使得关键键开始向语义相关区域收缩。与通常在生成 masks 时直接消除输入图像中的局部区域的做法不同,我们仔细审查了 patch token 的邻居关系,通过探索masks,考虑键在匹配矩阵中的位置。此外,我们在对比学习中使用遮蔽的局部输出,与全球输出进行竞争,生成积极和消极样本。常用的数据集实验表明,提出的 MCC 机制有效地将图像中的全局和局部视角对齐,取得了令人印象深刻的性能。

URL

https://arxiv.org/abs/2305.08491

PDF

https://arxiv.org/pdf/2305.08491.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot