OCR is All you need: Importing Multi-Modality into Image-based Defect Detection System

Abstract
Abstract (translated)
URL
PDF

Abstract

Automatic optical inspection (AOI) plays a pivotal role in the manufacturing process, predominantly leveraging high-resolution imaging instruments for scanning purposes. It detects anomalies by analyzing image textures or patterns, making it an essential tool in industrial manufacturing and quality control. Despite its importance, the deployment of models for AOI often faces challenges. These include limited sample sizes, which hinder effective feature learning, variations among source domains, and sensitivities to changes in lighting and camera positions during imaging. These factors collectively compromise the accuracy of model predictions. Traditional AOI often fails to capitalize on the rich mechanism-parameter information from machines or inside images, including statistical parameters, which typically benefit AOI classification. To address this, we introduce an external modality-guided data mining framework, primarily rooted in optical character recognition (OCR), to extract statistical features from images as a second modality to enhance performance, termed OANet (Ocr-Aoi-Net). A key aspect of our approach is the alignment of external modality features, extracted using a single modality-aware model, with image features encoded by a convolutional neural network. This synergy enables a more refined fusion of semantic representations from different modalities. We further introduce feature refinement and a gating function in our OANet to optimize the combination of these features, enhancing inference and decision-making capabilities. Experimental outcomes show that our methodology considerably boosts the recall rate of the defect detection model and maintains high robustness even in challenging scenarios.

Abstract (translated)

自动光学检测（AOI）在制造业过程中扮演着关键角色，主要利用高分辨率成像仪器进行扫描。它通过分析图像纹理或模式来检测异常，因此在工业制造和质量控制中成为必不可少的工具。尽管AOI非常重要，但部署模型进行AOI通常面临挑战。这些挑战包括样本量有限、源域间差异和成像过程中的光线和相机位置变化对模型的敏感性等。这些因素共同削弱了模型的预测准确性。传统的AOI往往没有充分利用机器或内部图像的丰富机制参数信息，包括统计参数，这些参数通常对AOI分类有利。为了应对这个问题，我们引入了一个外部模式引导的数据挖掘框架，主要基于光学字符识别（OCR），旨在从图像作为第二模态提取统计特征以提高性能，称之为OANet（Ocr-Aoi-Net）。我们方法的关键方面是对单模态模型的外模式特征与通过卷积神经网络编码的图像特征之间的对齐。这种协同作用使不同模态语义表示的融合更加精确。我们进一步引入了特征精度和一个门控函数在我们的OANet中优化这些特征，提高推理和决策能力。实验结果表明，我们的方法显著提高了缺陷检测模型的召回率，在具有挑战性的场景下，表现仍然良好。

URL

https://arxiv.org/abs/2403.11536

PDF

https://arxiv.org/pdf/2403.11536.pdf

OCR is All you need: Importing Multi-Modality into Image-based Defect Detection System

Abstract

Abstract (translated)

URL

PDF Copy

PDF