Paper Reading AI Learner

Spider: A Unified Framework for Context-dependent Concept Understanding

2024-05-02 04:58:29
Xiaoqi Zhao, Youwei Pang, Wei Ji, Baicheng Sheng, Jiaming Zuo, Lihe Zhang, Huchuan Lu

Abstract

Different from the context-independent (CI) concepts such as human, car, and airplane, context-dependent (CD) concepts require higher visual understanding ability, such as camouflaged object and medical lesion. Despite the rapid advance of many CD understanding tasks in respective branches, the isolated evolution leads to their limited cross-domain generalisation and repetitive technique innovation. Since there is a strong coupling relationship between foreground and background context in CD tasks, existing methods require to train separate models in their focused domains. This restricts their real-world CD concept understanding towards artificial general intelligence (AGI). We propose a unified model with a single set of parameters, Spider, which only needs to be trained once. With the help of the proposed concept filter driven by the image-mask group prompt, Spider is able to understand and distinguish diverse strong context-dependent concepts to accurately capture the Prompter's intention. Without bells and whistles, Spider significantly outperforms the state-of-the-art specialized models in 8 different context-dependent segmentation tasks, including 4 natural scenes (salient, camouflaged, and transparent objects and shadow) and 4 medical lesions (COVID-19, polyp, breast, and skin lesion with color colonoscopy, CT, ultrasound, and dermoscopy modalities). Besides, Spider shows obvious advantages in continuous learning. It can easily complete the training of new tasks by fine-tuning parameters less than 1\% and bring a tolerable performance degradation of less than 5\% for all old tasks. The source code will be publicly available at \href{this https URL}{Spider-UniCDSeg}.

Abstract (translated)

与上下文无关(CI)概念(如人类、汽车和飞机)不同,上下文依赖(CD)概念需要更高的视觉理解能力,例如伪装物体和医学病变。尽管许多CD理解任务在各自领域中迅速发展,但孤立的进化导致它们的跨领域泛化有限,技术创新重复。由于CD任务中前景与背景上下文之间存在强烈的耦合关系,现有方法需要在其关注领域上训练单独的模型。这限制了它们在现实世界CD概念理解方面的能力,将人工智能(AGI)推向极限。我们提出了一种统一模型Spider,它只需要一次训练。通过图像掩码组提示的观念过滤,Spider能够理解和区分各种强烈的上下文依赖概念,准确捕捉提示者的意图。在没有花言巧语的情况下,Spider在8个不同的上下文相关分割任务中的表现显著超过了最先进的专用模型,包括4个自然场景(显眼、伪装和透明物体和影子)和4个医学病变(COVID-19、疣、乳腺癌和皮肤病变彩色内窥镜、CT、超声和皮肤镜检)。此外,Spider在连续学习方面表现出明显优势。它可以通过对参数的微调轻松完成对新任务的训练,并为所有老任务带来不超过5%的容差性能下降。源代码将公开发布在\href{this <https://this URL>}{Spider-UniCDSeg}。

URL

https://arxiv.org/abs/2405.01002

PDF

https://arxiv.org/pdf/2405.01002.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot