Abstract
Vision-Language Models (VLMs), particularly CLIP, have revolutionized anomaly detection by enabling zero-shot and few-shot defect identification without extensive labeled datasets. By learning aligned representations of images and text, VLMs facilitate anomaly classification and segmentation through natural language descriptions of normal and abnormal states, eliminating traditional requirements for task-specific training or defect examples. This project presents a comprehensive analysis of VLM-based approaches for anomaly classification (AC) and anomaly segmentation (AS). We systematically investigate key architectural paradigms including sliding window-based dense feature extraction (WinCLIP), multi-stage feature alignment with learnable projections (AprilLab framework), and compositional prompt ensemble strategies. Our analysis evaluates these methods across critical dimensions: feature extraction mechanisms, text-visual alignment strategies, prompt engineering techniques, zero-shot versus few-shot trade-offs, computational efficiency, and cross-domain generalization. Through rigorous experimentation on benchmarks such as MVTec AD and VisA, we compare classification accuracy, segmentation precision, and inference efficiency. The primary contribution is a foundational understanding of how and why VLMs succeed in anomaly detection, synthesizing practical insights for method selection and identifying current limitations. This work aims to facilitate informed adoption of VLM-based methods in industrial quality control and guide future research directions.
Abstract (translated)
视觉-语言模型(VLM),尤其是CLIP,通过在无需大量标记数据集的情况下实现零样本和少量样本缺陷识别,彻底革新了异常检测。通过学习图像与文本之间的对齐表示,VLM使用户能够借助自然语言描述正常和异常状态来进行异常分类和分割,并且不需要特定任务的训练或故障示例。本项目全面分析了基于VLM的方法在异常分类(AC)和异常分割(AS)中的应用。我们系统地研究了几种关键架构范式,包括滑动窗口密集特征提取(WinCLIP)、多阶段可学习投影的特征对齐(AprilLab框架),以及组合提示集成策略。 我们的分析从多个维度评估这些方法:特征提取机制、文本-视觉对齐策略、提示工程技巧、零样本与少量样本之间的权衡、计算效率和跨域泛化能力。通过在MVTec AD和VisA等基准测试上的严格实验,我们比较了分类精度、分割精确度以及推理效率。 本工作的主要贡献是为VLM如何及为何能在异常检测中取得成功提供了一个基础性的理解,并总结出实际应用中的方法选择洞察与当前局限性。这项工作旨在促进工业质量控制领域基于VLM方法的明智采用,同时也为未来的研究方向提供建议和指导。
URL
https://arxiv.org/abs/2601.13440