Paper Reading AI Learner

AnomalyXFusion: Multi-modal Anomaly Synthesis with Diffusion

2024-04-30 10:48:43
Jie Hu, Yawen Huang, Yilin Lu, Guoyang Xie, Guannan Jiang, Yefeng Zheng

Abstract

Anomaly synthesis is one of the effective methods to augment abnormal samples for training. However, current anomaly synthesis methods predominantly rely on texture information as input, which limits the fidelity of synthesized abnormal samples. Because texture information is insufficient to correctly depict the pattern of anomalies, especially for logical anomalies. To surmount this obstacle, we present the AnomalyXFusion framework, designed to harness multi-modality information to enhance the quality of synthesized abnormal samples. The AnomalyXFusion framework comprises two distinct yet synergistic modules: the Multi-modal In-Fusion (MIF) module and the Dynamic Dif-Fusion (DDF) module. The MIF module refines modality alignment by aggregating and integrating various modality features into a unified embedding space, termed X-embedding, which includes image, text, and mask features. Concurrently, the DDF module facilitates controlled generation through an adaptive adjustment of X-embedding conditioned on the diffusion steps. In addition, to reveal the multi-modality representational power of AnomalyXFusion, we propose a new dataset, called MVTec Caption. More precisely, MVTec Caption extends 2.2k accurate image-mask-text annotations for the MVTec AD and LOCO datasets. Comprehensive evaluations demonstrate the effectiveness of AnomalyXFusion, especially regarding the fidelity and diversity for logical anomalies. Project page: http:github.com/hujiecpp/MVTec-Caption

Abstract (translated)

异常合成是一种有效的增强训练异常样本的方法。然而,现有的异常合成方法主要依赖于纹理信息作为输入,这限制了合成异常样本的保真度。因为纹理信息不足以正确地描绘异常的图案,尤其是对于逻辑异常。为克服这一障碍,我们提出了AnomalyXFusion框架,旨在利用多模态信息提高合成异常样本的质量。AnomalyXFusion框架包括两个不同的但相互作用的模块:多模态In-Fusion(MIF)模块和动态Dif-Fusion(DDF)模块。MIF模块通过聚合和整合各种模态特征到统一的嵌入空间X-embedding中,称为X-嵌入,来优化模态对齐。同时,DDF模块通过根据扩散步骤自适应调整X-嵌入来促进控制生成。此外,为了揭示AnomalyXFusion的多模态表示能力,我们提出了一个新的数据集,称为MVTec Caption。更具体地说,MVTec Caption扩展了MVTec AD和LOCO数据集中的2200个准确图像-纹理-文本注释。全面的评估证明了AnomalyXFusion的有效性,特别是对于逻辑异常的保真度和多样性。项目页面:http:github.com/hujiecpp/MVTec-Caption

URL

https://arxiv.org/abs/2404.19444

PDF

https://arxiv.org/pdf/2404.19444.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot