Abstract
Anomaly synthesis is one of the effective methods to augment abnormal samples for training. However, current anomaly synthesis methods predominantly rely on texture information as input, which limits the fidelity of synthesized abnormal samples. Because texture information is insufficient to correctly depict the pattern of anomalies, especially for logical anomalies. To surmount this obstacle, we present the AnomalyXFusion framework, designed to harness multi-modality information to enhance the quality of synthesized abnormal samples. The AnomalyXFusion framework comprises two distinct yet synergistic modules: the Multi-modal In-Fusion (MIF) module and the Dynamic Dif-Fusion (DDF) module. The MIF module refines modality alignment by aggregating and integrating various modality features into a unified embedding space, termed X-embedding, which includes image, text, and mask features. Concurrently, the DDF module facilitates controlled generation through an adaptive adjustment of X-embedding conditioned on the diffusion steps. In addition, to reveal the multi-modality representational power of AnomalyXFusion, we propose a new dataset, called MVTec Caption. More precisely, MVTec Caption extends 2.2k accurate image-mask-text annotations for the MVTec AD and LOCO datasets. Comprehensive evaluations demonstrate the effectiveness of AnomalyXFusion, especially regarding the fidelity and diversity for logical anomalies. Project page: http:github.com/hujiecpp/MVTec-Caption
Abstract (translated)
异常合成是一种有效的增强训练异常样本的方法。然而,现有的异常合成方法主要依赖于纹理信息作为输入,这限制了合成异常样本的保真度。因为纹理信息不足以正确地描绘异常的图案,尤其是对于逻辑异常。为克服这一障碍,我们提出了AnomalyXFusion框架,旨在利用多模态信息提高合成异常样本的质量。AnomalyXFusion框架包括两个不同的但相互作用的模块:多模态In-Fusion(MIF)模块和动态Dif-Fusion(DDF)模块。MIF模块通过聚合和整合各种模态特征到统一的嵌入空间X-embedding中,称为X-嵌入,来优化模态对齐。同时,DDF模块通过根据扩散步骤自适应调整X-嵌入来促进控制生成。此外,为了揭示AnomalyXFusion的多模态表示能力,我们提出了一个新的数据集,称为MVTec Caption。更具体地说,MVTec Caption扩展了MVTec AD和LOCO数据集中的2200个准确图像-纹理-文本注释。全面的评估证明了AnomalyXFusion的有效性,特别是对于逻辑异常的保真度和多样性。项目页面:http:github.com/hujiecpp/MVTec-Caption
URL
https://arxiv.org/abs/2404.19444