Abstract
Different from the context-independent (CI) concepts such as human, car, and airplane, context-dependent (CD) concepts require higher visual understanding ability, such as camouflaged object and medical lesion. Despite the rapid advance of many CD understanding tasks in respective branches, the isolated evolution leads to their limited cross-domain generalisation and repetitive technique innovation. Since there is a strong coupling relationship between foreground and background context in CD tasks, existing methods require to train separate models in their focused domains. This restricts their real-world CD concept understanding towards artificial general intelligence (AGI). We propose a unified model with a single set of parameters, Spider, which only needs to be trained once. With the help of the proposed concept filter driven by the image-mask group prompt, Spider is able to understand and distinguish diverse strong context-dependent concepts to accurately capture the Prompter's intention. Without bells and whistles, Spider significantly outperforms the state-of-the-art specialized models in 8 different context-dependent segmentation tasks, including 4 natural scenes (salient, camouflaged, and transparent objects and shadow) and 4 medical lesions (COVID-19, polyp, breast, and skin lesion with color colonoscopy, CT, ultrasound, and dermoscopy modalities). Besides, Spider shows obvious advantages in continuous learning. It can easily complete the training of new tasks by fine-tuning parameters less than 1\% and bring a tolerable performance degradation of less than 5\% for all old tasks. The source code will be publicly available at \href{this https URL}{Spider-UniCDSeg}.
Abstract (translated)
与上下文无关(CI)概念(如人类、汽车和飞机)不同,上下文依赖(CD)概念需要更高的视觉理解能力,例如伪装物体和医学病变。尽管许多CD理解任务在各自领域中迅速发展,但孤立的进化导致它们的跨领域泛化有限,技术创新重复。由于CD任务中前景与背景上下文之间存在强烈的耦合关系,现有方法需要在其关注领域上训练单独的模型。这限制了它们在现实世界CD概念理解方面的能力,将人工智能(AGI)推向极限。我们提出了一种统一模型Spider,它只需要一次训练。通过图像掩码组提示的观念过滤,Spider能够理解和区分各种强烈的上下文依赖概念,准确捕捉提示者的意图。在没有花言巧语的情况下,Spider在8个不同的上下文相关分割任务中的表现显著超过了最先进的专用模型,包括4个自然场景(显眼、伪装和透明物体和影子)和4个医学病变(COVID-19、疣、乳腺癌和皮肤病变彩色内窥镜、CT、超声和皮肤镜检)。此外,Spider在连续学习方面表现出明显优势。它可以通过对参数的微调轻松完成对新任务的训练,并为所有老任务带来不超过5%的容差性能下降。源代码将公开发布在\href{this <https://this URL>}{Spider-UniCDSeg}。
URL
https://arxiv.org/abs/2405.01002