Paper Reading AI Learner

A Unified Query-based Paradigm for Camouflaged Instance Segmentation

2023-08-14 18:23:18
Do Dong, Jialun Pei, Rongrong Gao, Tian-Zhu Xiang, Shuo Wang, Huan Xiong

Abstract

Due to the high similarity between camouflaged instances and the background, the recently proposed camouflaged instance segmentation (CIS) faces challenges in accurate localization and instance segmentation. To this end, inspired by query-based transformers, we propose a unified query-based multi-task learning framework for camouflaged instance segmentation, termed UQFormer, which builds a set of mask queries and a set of boundary queries to learn a shared composed query representation and efficiently integrates global camouflaged object region and boundary cues, for simultaneous instance segmentation and instance boundary detection in camouflaged scenarios. Specifically, we design a composed query learning paradigm that learns a shared representation to capture object region and boundary features by the cross-attention interaction of mask queries and boundary queries in the designed multi-scale unified learning transformer decoder. Then, we present a transformer-based multi-task learning framework for simultaneous camouflaged instance segmentation and camouflaged instance boundary detection based on the learned composed query representation, which also forces the model to learn a strong instance-level query representation. Notably, our model views the instance segmentation as a query-based direct set prediction problem, without other post-processing such as non-maximal suppression. Compared with 14 state-of-the-art approaches, our UQFormer significantly improves the performance of camouflaged instance segmentation. Our code will be available at this https URL.

Abstract (translated)

由于伪装实例与背景具有很高的相似性,最近提出的伪装实例分割(CIS)面临精确定位和实例分割的挑战。为此,我们基于查询的Transformer提出了一个统一的查询基多任务学习框架,称为UQ Former,该框架构建了一系列掩码查询和边界查询,学习共同组成的查询表示,并高效集成全球伪装对象区域和边界 cues,在伪装场景中实现同时实例分割和实例边界检测。具体来说,我们设计了一个组合查询学习范式,通过学习共同的表示来捕捉对象区域和边界特征,通过设计的多尺度统一学习Transformer解码器中的掩码和边界查询的交叉注意力相互作用。然后,我们提出了一个基于学习的多任务学习框架,用于同时伪装实例分割和伪装实例边界检测,该框架基于学习到的共同查询表示,并迫使模型学习强大的实例级查询表示。值得注意的是,我们模型将实例分割视为基于查询的直接序列预测问题,而无需非最大抑制等后处理。与14个先进的方法相比,我们的UQ Former显著改善了伪装实例分割的性能。我们的代码将在这个httpsURL上可用。

URL

https://arxiv.org/abs/2308.07392

PDF

https://arxiv.org/pdf/2308.07392.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot