Paper Reading AI Learner

IPixMatch: Boost Semi-supervised Semantic Segmentation with Inter-Pixel Relation

2024-04-29 17:27:37
Kebin Wu, Wenbin Li, Xiaofei Xiao

Abstract

The scarcity of labeled data in real-world scenarios is a critical bottleneck of deep learning's effectiveness. Semi-supervised semantic segmentation has been a typical solution to achieve a desirable tradeoff between annotation cost and segmentation performance. However, previous approaches, whether based on consistency regularization or self-training, tend to neglect the contextual knowledge embedded within inter-pixel relations. This negligence leads to suboptimal performance and limited generalization. In this paper, we propose a novel approach IPixMatch designed to mine the neglected but valuable Inter-Pixel information for semi-supervised learning. Specifically, IPixMatch is constructed as an extension of the standard teacher-student network, incorporating additional loss terms to capture inter-pixel relations. It shines in low-data regimes by efficiently leveraging the limited labeled data and extracting maximum utility from the available unlabeled data. Furthermore, IPixMatch can be integrated seamlessly into most teacher-student frameworks without the need of model modification or adding additional components. Our straightforward IPixMatch method demonstrates consistent performance improvements across various benchmark datasets under different partitioning protocols.

Abstract (translated)

在现实场景中,有标签数据的稀缺是深度学习效果的一个关键瓶颈。半监督语义分割是一种常见的解决方案,以实现注释成本和分割性能之间的理想平衡。然而,以前的方法,无论是基于一致性正则化还是自训练,都倾向于忽视内部像素关系中固有的上下文知识。这种疏忽导致 suboptimal 的性能和有限的泛化能力。在本文中,我们提出了一种名为 IPixMatch 的新颖方法,旨在通过半监督学习挖掘被忽视但有益的跨像素信息。具体来说,IPixMatch 是一个标准的老师-学生网络的扩展,包括额外的损失项来捕捉跨像素关系。它在低数据量的情况下通过有效地利用有限的标记数据并从可用未标记数据中挖掘最大效用来闪耀。此外,IPixMatch 可以无缝地集成到大多数老师-学生框架中,而无需对模型进行修改或添加额外组件。我们直接使用 IPixMatch 的方法在不同的分片协议下展示了 consistent 的性能提升。

URL

https://arxiv.org/abs/2404.18891

PDF

https://arxiv.org/pdf/2404.18891.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot