Paper Reading AI Learner

OCR is All you need: Importing Multi-Modality into Image-based Defect Detection System

2024-03-18 07:41:39
Chih-Chung Hsu, Chia-Ming Lee, Chun-Hung Sun, Kuang-Ming Wu

Abstract

Automatic optical inspection (AOI) plays a pivotal role in the manufacturing process, predominantly leveraging high-resolution imaging instruments for scanning purposes. It detects anomalies by analyzing image textures or patterns, making it an essential tool in industrial manufacturing and quality control. Despite its importance, the deployment of models for AOI often faces challenges. These include limited sample sizes, which hinder effective feature learning, variations among source domains, and sensitivities to changes in lighting and camera positions during imaging. These factors collectively compromise the accuracy of model predictions. Traditional AOI often fails to capitalize on the rich mechanism-parameter information from machines or inside images, including statistical parameters, which typically benefit AOI classification. To address this, we introduce an external modality-guided data mining framework, primarily rooted in optical character recognition (OCR), to extract statistical features from images as a second modality to enhance performance, termed OANet (Ocr-Aoi-Net). A key aspect of our approach is the alignment of external modality features, extracted using a single modality-aware model, with image features encoded by a convolutional neural network. This synergy enables a more refined fusion of semantic representations from different modalities. We further introduce feature refinement and a gating function in our OANet to optimize the combination of these features, enhancing inference and decision-making capabilities. Experimental outcomes show that our methodology considerably boosts the recall rate of the defect detection model and maintains high robustness even in challenging scenarios.

Abstract (translated)

自动光学检测(AOI)在制造业过程中扮演着关键角色,主要利用高分辨率成像仪器进行扫描。它通过分析图像纹理或模式来检测异常,因此在工业制造和质量控制中成为必不可少的工具。尽管AOI非常重要,但部署模型进行AOI通常面临挑战。这些挑战包括样本量有限、源域间差异和成像过程中的光线和相机位置变化对模型的敏感性等。这些因素共同削弱了模型的预测准确性。传统的AOI往往没有充分利用机器或内部图像的丰富机制参数信息,包括统计参数,这些参数通常对AOI分类有利。为了应对这个问题,我们引入了一个外部模式引导的数据挖掘框架,主要基于光学字符识别(OCR),旨在从图像作为第二模态提取统计特征以提高性能,称之为OANet(Ocr-Aoi-Net)。我们方法的关键方面是对单模态模型的外模式特征与通过卷积神经网络编码的图像特征之间的对齐。这种协同作用使不同模态语义表示的融合更加精确。我们进一步引入了特征精度和一个门控函数在我们的OANet中优化这些特征,提高推理和决策能力。实验结果表明,我们的方法显著提高了缺陷检测模型的召回率,在具有挑战性的场景下,表现仍然良好。

URL

https://arxiv.org/abs/2403.11536

PDF

https://arxiv.org/pdf/2403.11536.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot