Paper Reading AI Learner

Learnable Prompt for Few-Shot Semantic Segmentation in Remote Sensing Domain

2024-04-16 06:33:08
Steve Andreas Immanuel, Hagai Raja Sinulingga

Abstract

Few-shot segmentation is a task to segment objects or regions of novel classes within an image given only a few annotated examples. In the generalized setting, the task extends to segment both the base and the novel classes. The main challenge is how to train the model such that the addition of novel classes does not hurt the base classes performance, also known as catastrophic forgetting. To mitigate this issue, we use SegGPT as our base model and train it on the base classes. Then, we use separate learnable prompts to handle predictions for each novel class. To handle various object sizes which typically present in remote sensing domain, we perform patch-based prediction. To address the discontinuities along patch boundaries, we propose a patch-and-stitch technique by re-framing the problem as an image inpainting task. During inference, we also utilize image similarity search over image embeddings for prompt selection and novel class filtering to reduce false positive predictions. Based on our experiments, our proposed method boosts the weighted mIoU of a simple fine-tuned SegGPT from 15.96 to 35.08 on the validation set of few-shot OpenEarthMap dataset given in the challenge.

Abstract (translated)

少样本分割是在只有几篇注释示例的情况下,对图像中 novel 类别的对象或区域进行分割的任务。在扩展设置中,任务扩展到同时分割基础类和 novel 类别。主要挑战是训练模型,使得 novel 类别的添加不会损害基础类别的性能,也就是灾难性遗忘(catastrophic forgetting)。为了减轻这个问题,我们使用 SegGPT 作为基础模型,并在其基础上进行训练。然后,我们使用独立的可学习提示来处理每个 novel 类别的预测。为了处理遥感领域中通常存在的各种对象大小,我们采用基于补丁的预测。为了处理补丁边界上的不连续性,我们提出了通过重新将问题重构为图像修复任务来解决补丁和缝合技术。在推理过程中,我们还利用图像相似搜索来选择提示和过滤 novel 类别,以降低虚假阳性预测。根据我们的实验,我们对简单微调的 SegGPT 的加权 mIoU 在 few-shot OpenEarthMap 数据集的验证集上从 15.96 提高到了 35.08。

URL

https://arxiv.org/abs/2404.10307

PDF

https://arxiv.org/pdf/2404.10307.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot