Paper Reading AI Learner

Defect Image Sample Generation With Diffusion Prior for Steel Surface Defect Recognition

2024-05-03 06:03:37
Yichun Tai, Kun Yang, Tao Peng, Zhenzhen Huang, Zhijiang Zhang

Abstract

The task of steel surface defect recognition is an industrial problem with great industry values. The data insufficiency is the major challenge in training a robust defect recognition network. Existing methods have investigated to enlarge the dataset by generating samples with generative models. However, their generation quality is still limited by the insufficiency of defect image samples. To this end, we propose Stable Surface Defect Generation (StableSDG), which transfers the vast generation distribution embedded in Stable Diffusion model for steel surface defect image generation. To tackle with the distinctive distribution gap between steel surface images and generated images of the diffusion model, we propose two processes. First, we align the distribution by adapting parameters of the diffusion model, adopted both in the token embedding space and network parameter space. Besides, in the generation process, we propose image-oriented generation rather than from pure Gaussian noises. We conduct extensive experiments on steel surface defect dataset, demonstrating state-of-the-art performance on generating high-quality samples and training recognition models, and both designed processes are significant for the performance.

Abstract (translated)

钢铁表面缺陷识别是一个具有很高产业价值的工业问题。数据不足是训练一个稳健缺陷识别网络的主要挑战。现有方法通过生成具有生成模型的样本来扩大数据集,但是它们的生成质量仍然受到缺陷图像样本不足的限制。为此,我们提出了Stable Surface Defect Generation(StableSDG),它将钢铁表面缺陷图像生成的巨大分布转移到Stable Diffusion模型中。为了解决钢铁表面图像和扩散模型生成的图像之间的独特分布差距,我们提出了两个过程。首先,通过自适应扩散模型的参数,在词嵌入空间和网络参数空间中调整分布。其次,在生成过程中,我们提出了图像相关生成,而不是仅仅基于高斯噪声的纯生成。我们对钢铁表面缺陷数据集进行了广泛的实验,证明了设计过程对生成高质量样本和训练识别模型具有显著意义,而且这两个过程对于性能都至关重要。

URL

https://arxiv.org/abs/2405.01872

PDF

https://arxiv.org/pdf/2405.01872.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot