Paper Reading AI Learner

Group-On: Boosting One-Shot Segmentation with Supportive Query

2024-04-18 03:10:04
Hanjing Zhou, Mingze Yin, JinTai Chen, Danny Chen, Jian Wu

Abstract

One-shot semantic segmentation aims to segment query images given only ONE annotated support image of the same class. This task is challenging because target objects in the support and query images can be largely different in appearance and pose (i.e., intra-class variation). Prior works suggested that incorporating more annotated support images in few-shot settings boosts performances but increases costs due to additional manual labeling. In this paper, we propose a novel approach for ONE-shot semantic segmentation, called Group-On, which packs multiple query images in batches for the benefit of mutual knowledge support within the same category. Specifically, after coarse segmentation masks of the batch of queries are predicted, query-mask pairs act as pseudo support data to enhance mask predictions mutually, under the guidance of a simple Group-On Voting module. Comprehensive experiments on three standard benchmarks show that, in the ONE-shot setting, our Group-On approach significantly outperforms previous works by considerable margins. For example, on the COCO-20i dataset, we increase mIoU scores by 8.21% and 7.46% on ASNet and HSNet baselines, respectively. With only one support image, Group-On can be even competitive with the counterparts using 5 annotated support images.

Abstract (translated)

一次性的语义分割旨在对同一类别的仅有一个标注支持图像的查询图像进行分割。这个任务具有挑战性,因为支持图像和查询图像中的目标物体在 appearance 和 pose(即类内变化)上可能会有很大的差异。之前的 works 建议,在少样本设置中包含更多的标注支持图像可以提高性能,但增加成本是因为需要进行手动标注。在本文中,我们提出了一种名为 Group-On 的新颖的 ONE-shot语义分割方法,将多个查询图像打包成批次以促进同一类别内的相互知识支持。具体来说,在粗分割掩码预测之后,查询-掩码对充当伪支持数据,在简单 Group-On 投票模块的指导下相互增强 mask 预测。在三个标准基准上进行全面的实验表明,在 ONE-shot设置中,我们的 Group-On 方法显著超过了之前的工作。例如,在 COCO-20i 数据集上,我们将 mIoU 分数分别提高了 8.21% 和 7.46%。仅使用一个支持图像时,Group-On 甚至可以与使用 5 个标注支持图像的对照者相匹敌。

URL

https://arxiv.org/abs/2404.11871

PDF

https://arxiv.org/pdf/2404.11871.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot