Paper Reading AI Learner

Dual Attention U-Net with Feature Infusion: Pushing the Boundaries of Multiclass Defect Segmentation

2023-12-21 17:23:49
Rasha Alshawi, Md Tamjidul Hoque, Md Meftahul Ferdaus, Mahdi Abdelguerfi, Kendall Niles, Ken Prathak, Joe Tom, Jordan Klein, Murtada Mousa, Johny Javier Lopez

Abstract

The proposed architecture, Dual Attentive U-Net with Feature Infusion (DAU-FI Net), addresses challenges in semantic segmentation, particularly on multiclass imbalanced datasets with limited samples. DAU-FI Net integrates multiscale spatial-channel attention mechanisms and feature injection to enhance precision in object localization. The core employs a multiscale depth-separable convolution block, capturing localized patterns across scales. This block is complemented by a spatial-channel squeeze and excitation (scSE) attention unit, modeling inter-dependencies between channels and spatial regions in feature maps. Additionally, additive attention gates refine segmentation by connecting encoder-decoder pathways. To augment the model, engineered features using Gabor filters for textural analysis, Sobel and Canny filters for edge detection are injected guided by semantic masks to expand the feature space strategically. Comprehensive experiments on a challenging sewer pipe and culvert defect dataset and a benchmark dataset validate DAU-FI Net's capabilities. Ablation studies highlight incremental benefits from attention blocks and feature injection. DAU-FI Net achieves state-of-the-art mean Intersection over Union (IoU) of 95.6% and 98.8% on the defect test set and benchmark respectively, surpassing prior methods by 8.9% and 12.6%, respectively. Ablation studies highlight incremental benefits from attention blocks and feature injection. The proposed architecture provides a robust solution, advancing semantic segmentation for multiclass problems with limited training data. Our sewer-culvert defects dataset, featuring pixel-level annotations, opens avenues for further research in this crucial domain. Overall, this work delivers key innovations in architecture, attention, and feature engineering to elevate semantic segmentation efficacy.

Abstract (translated)

提出的架构,双重关注U-Net与特征注入(DAU-FI Net),解决了在有限样本的语义分割数据集中出现的挑战。DAU-FI Net整合了多尺度空间通道关注机制和特征注入,以提高物体定位的精度。核心采用了多尺度深度可分离卷积模块,捕捉到尺度下的局部模式。这个模块由一个多尺度深度卷积和激活(scSE)关注单元补充,建模特征图通道和空间区域之间的相互依赖关系。此外,自适应注意力门通过连接编码器-解码器路径来优化分割。为了增加模型,使用Gabor滤波器提取文本分析特征,Sobel和Canny滤波器进行边缘检测的工程特征,通过语义掩码引导注入,扩展了特征空间。对具有挑战性的污水管道和干沟缺陷数据集以及基准数据集的全面实验证明,DAU-FI Net的性能优越。消融研究强调了自适应注意力和特征注入的增量益处。DAU-FI Net在缺陷测试集和基准数据集上分别实现了95.6%和98.8%的IoU,比之前方法分别提高了8.9%和12.6%。消融研究强调了自适应注意力和特征注入的增量益处。所提出的架构为带有有限训练数据的多样类问题提供了一个稳健的解决方案,提高了语义分割的效果。我们的污水管道和干沟缺陷数据集,具有像素级别的标注,为这个关键领域进一步的研究提供了途径。总的来说,这项工作在架构、注意力和特征工程方面取得了关键创新,提高了语义分割的有效性。

URL

https://arxiv.org/abs/2312.14053

PDF

https://arxiv.org/pdf/2312.14053.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot