Abstract
Ensuring that every vehicle leaving a modern production line is built to the correct \emph{variant} specification and is free from visible defects is an increasingly complex challenge. We present the \textbf{Automated Vehicle Inspection (AVI)} platform, an end-to-end, \emph{multi-view} perception system that couples deep-learning detectors with a semantic rule engine to deliver \emph{variant-aware} quality control in real time. Eleven synchronized cameras capture a full 360° sweep of each vehicle; task-specific views are then routed to specialised modules: YOLOv8 for part detection, EfficientNet for ICE/EV classification, Gemini-1.5 Flash for mascot OCR, and YOLOv8-Seg for scratch-and-dent segmentation. A view-aware fusion layer standardises evidence, while a VIN-conditioned rule engine compares detected features against the expected manifest, producing an interpretable pass/fail report in \(\approx\! 300\,\text{ms}\). On a mixed data set of Original Equipment Manufacturer(OEM) vehicle data sets of four distinct models plus public scratch/dent images, AVI achieves \textbf{ 93 \%} verification accuracy, \textbf{86 \%} defect-detection recall, and sustains \(\mathbf{3.3}\) vehicles/min, surpassing single-view or no segmentation baselines by large margins. To our knowledge, this is the first publicly reported system that unifies multi-camera feature validation with defect detection in a deployable automotive setting in industry.
Abstract (translated)
确保每辆从现代生产线下来的车辆符合正确的变体规格并且没有可见缺陷,这是一个日益复杂的挑战。我们介绍了一个端到端的多视角感知系统——**自动车辆检测(AVI)平台**,该系统结合了深度学习检测器和语义规则引擎,实现实时的“变体感知”质量控制。 十一台同步相机捕捉每辆车的360度全方位图像;特定任务视图随后被路由到专门模块:使用YOLOv8进行零件检测、EfficientNet用于ICE/EV分类、Gemini-1.5 Flash用于标志OCR(光学字符识别)以及YOLOv8-Seg用于划痕和凹陷的分割。一个视角感知融合层标准化证据,而VIN条件规则引擎将检测到的功能与预期清单进行比较,大约在300毫秒内生成可解释的通过/失败报告。 在一个混合数据集上测试了AVI平台的表现,该数据集包括四个不同车型的原始设备制造商(OEM)车辆数据集和公共划痕/凹陷图像。AVI平台实现了93%的验证准确率、86%的缺陷检测召回率,并且可以每分钟处理3.3辆车,在单视角或无分割基线系统上取得了显著优势。 据我们所知,这是第一个在工业界部署并公开报告的系统,它统一了多相机特征验证与缺陷检测的功能。
URL
https://arxiv.org/abs/2509.26454