Paper Reading AI Learner

Multiresolution Feature Guidance Based Transformer for Anomaly Detection

2023-05-24 08:31:38
Shuting Yan, Pingping Chen, Honghui Chen, Huan Mao, Feng Chen, Zhijian Lin

Abstract

Anomaly detection is represented as an unsupervised learning to identify deviated images from normal images. In general, there are two main challenges of anomaly detection tasks, i.e., the class imbalance and the unexpectedness of anomalies. In this paper, we propose a multiresolution feature guidance method based on Transformer named GTrans for unsupervised anomaly detection and localization. In GTrans, an Anomaly Guided Network (AGN) pre-trained on ImageNet is developed to provide surrogate labels for features and tokens. Under the tacit knowledge guidance of the AGN, the anomaly detection network named Trans utilizes Transformer to effectively establish a relationship between features with multiresolution, enhancing the ability of the Trans in fitting the normal data manifold. Due to the strong generalization ability of AGN, GTrans locates anomalies by comparing the differences in spatial distance and direction of multi-scale features extracted from the AGN and the Trans. Our experiments demonstrate that the proposed GTrans achieves state-of-the-art performance in both detection and localization on the MVTec AD dataset. GTrans achieves image-level and pixel-level anomaly detection AUROC scores of 99.0% and 97.9% on the MVTec AD dataset, respectively.

Abstract (translated)

异常检测是一种 unsupervised 学习,用于识别偏离正常图像的特征。一般来说,异常检测任务面临两个主要挑战,即类别不平衡和异常的意外性。在本文中,我们提出了一种基于Transformer的多功能特征引导方法,名为 GTrans,用于 unsupervised 的异常检测和定位。在 GTrans 中,我们开发了基于 ImageNet 预训练的异常引导网络(AGN),以提供特征和代币的替代标签。在 AGN 的指导下,名为 Trans 的异常检测网络利用 Transformer 有效地建立多功能特征之间的关系,增强 Trans 适应正常数据集的能力。由于 AGN 的强烈泛化能力,GTrans 通过比较从 AGN 和 Trans 提取的多尺度特征的空间距离和方向来确定异常的位置。我们的实验表明,我们提出的 GTrans 在 MVTec AD 数据集上实现了最先进的检测和定位性能。GTrans 在 MVTec AD 数据集上分别实现了图像级和像素级的异常检测 AUROC 得分为 99.0%。

URL

https://arxiv.org/abs/2305.14880

PDF

https://arxiv.org/pdf/2305.14880.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot