Abstract
This study introduces an efficacious approach, Masked Collaborative Contrast (MCC), to emphasize semantic regions in weakly supervised semantic segmentation. MCC adroitly incorporates concepts from masked image modeling and contrastive learning to devise Transformer blocks that induce keys to contract towards semantically pertinent regions. Unlike prevalent techniques that directly eradicate patch regions in the input image when generating masks, we scrutinize the neighborhood relations of patch tokens by exploring masks considering keys on the affinity matrix. Moreover, we generate positive and negative samples in contrastive learning by utilizing the masked local output and contrasting it with the global output. Elaborate experiments on commonly employed datasets evidences that the proposed MCC mechanism effectively aligns global and local perspectives within the image, attaining impressive performance.
Abstract (translated)
这项研究介绍了一种有效的方法——遮蔽协同比较(MCC),用于在弱监督下语义分割中强调语义区域。MCC巧妙地融合了遮蔽图像建模和对比学习的概念,设计出了Transformer blocks,使得关键键开始向语义相关区域收缩。与通常在生成 masks 时直接消除输入图像中的局部区域的做法不同,我们仔细审查了 patch token 的邻居关系,通过探索masks,考虑键在匹配矩阵中的位置。此外,我们在对比学习中使用遮蔽的局部输出,与全球输出进行竞争,生成积极和消极样本。常用的数据集实验表明,提出的 MCC 机制有效地将图像中的全局和局部视角对齐,取得了令人印象深刻的性能。
URL
https://arxiv.org/abs/2305.08491