Paper Reading AI Learner

Multi Modal Attention Networks with Uncertainty Quantification for Automated Concrete Bridge Deck Delamination Detection

2025-12-23 07:16:18
Alireza Moayedikia, Sattar Dorafshan

Abstract

Deteriorating civil infrastructure requires automated inspection techniques overcoming limitations of visual assessment. While Ground Penetrating Radar and Infrared Thermography enable subsurface defect detection, single modal approaches face complementary constraints radar struggles with moisture and shallow defects, while thermography exhibits weather dependency and limited depth. This paper presents a multi modal attention network fusing radar temporal patterns with thermal spatial signatures for bridge deck delamination detection. Our architecture introduces temporal attention for radar processing, spatial attention for thermal features, and cross modal fusion with learnable embeddings discovering complementary defect patterns invisible to individual sensors. We incorporate uncertainty quantification through Monte Carlo dropout and learned variance estimation, decomposing uncertainty into epistemic and aleatoric components for safety critical decisions. Experiments on five bridge datasets reveal that on balanced to moderately imbalanced data, our approach substantially outperforms baselines in accuracy and AUC representing meaningful improvements over single modal and concatenation based fusion. Ablation studies demonstrate cross modal attention provides critical gains beyond within modality attention, while multi head mechanisms achieve improved calibration. Uncertainty quantification reduces calibration error, enabling selective prediction by rejecting uncertain cases. However, under extreme class imbalance, attention mechanisms show vulnerability to majority class collapse. These findings provide actionable guidance: attention based architecture performs well across typical scenarios, while extreme imbalance requires specialized techniques. Our system maintains deployment efficiency, enabling real time inspection with characterized capabilities and limitations.

Abstract (translated)

不断恶化的民用基础设施需要采用自动化检测技术来克服视觉评估的局限性。虽然地面穿透雷达(Ground Penetrating Radar,GPR)和红外热成像技术能够实现对内部缺陷的检测,但单一模式的方法面临着互补性的限制:雷达在处理含水量较高的环境和浅层缺陷时表现不佳,而热成像则受天气条件的影响,并且探测深度有限。本文提出了一种多模态注意力网络,该网络融合了雷达的时间模式与热图像的空间特征,用于桥梁桥面脱胶检测。 我们的架构引入了针对雷达处理的时域注意机制、针对热图特征的空间注意机制以及跨模态融合机制(利用可学习嵌入来发现单个传感器无法识别的互补缺陷模式)。我们通过蒙特卡洛丢弃法和已学得方差估计进行不确定性量化,将不确定性分解为知识不确定性和数据不确性成分,以支持安全关键决策。在五座桥梁的数据集上进行的实验显示,在平衡到中度不平衡的数据集中,与基线方法相比,我们的方法在准确率和AUC(接收者操作特性曲线下的面积)方面有了显著提升,代表了单模态和基于连接融合的方法的重大改进。 消融研究表明,跨模态注意力机制提供了超出单一模态内注意机制的关键增益,而多头机制实现了更好的校准。不确定性量化减少了校准误差,并通过拒绝不明确的情况支持选择性预测。然而,在极端类别不平衡的情况下,注意力机制显示出对多数类别的崩溃现象的敏感性。 这些发现为实际应用提供了指导:基于注意力架构在典型场景中表现良好,但在极端不平衡条件下需要专门的技术手段。我们的系统保持了部署效率,能够实现具有规定能力与限制条件下的实时检测。

URL

https://arxiv.org/abs/2512.20113

PDF

https://arxiv.org/pdf/2512.20113.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot