Paper Reading AI Learner

YOLO9tr: A Lightweight Model for Pavement Damage Detection Utilizing a Generalized Efficient Layer Aggregation Network and Attention Mechanism

2024-06-18 09:30:21
Sompote Youwai, Achitaphon Chaiyaphat, Pawarotorn Chaipetch

Abstract

Maintaining road pavement integrity is crucial for ensuring safe and efficient transportation. Conventional methods for assessing pavement condition are often laborious and susceptible to human error. This paper proposes YOLO9tr, a novel lightweight object detection model for pavement damage detection, leveraging the advancements of deep learning. YOLO9tr is based on the YOLOv9 architecture, incorporating a partial attention block that enhances feature extraction and attention mechanisms, leading to improved detection performance in complex scenarios. The model is trained on a comprehensive dataset comprising road damage images from multiple countries, including an expanded set of damage categories beyond the standard four. This broadened classification range allows for a more accurate and realistic assessment of pavement conditions. Comparative analysis demonstrates YOLO9tr's superior precision and inference speed compared to state-of-the-art models like YOLO8, YOLO9 and YOLO10, achieving a balance between computational efficiency and detection accuracy. The model achieves a high frame rate of up to 136 FPS, making it suitable for real-time applications such as video surveillance and automated inspection systems. The research presents an ablation study to analyze the impact of architectural modifications and hyperparameter variations on model performance, further validating the effectiveness of the partial attention block. The results highlight YOLO9tr's potential for practical deployment in real-time pavement condition monitoring, contributing to the development of robust and efficient solutions for maintaining safe and functional road infrastructure.

Abstract (translated)

维护道路沥青路面完整性对确保安全和高效的运输至关重要。通常评估沥青路面状况的方法往往费力且容易出错。本文提出了一种名为YOLO9tr的新型轻量级沥青路面损伤检测模型,利用深度学习的进展。YOLO9tr基于YOLOv9架构,包括一个部分注意块,可以增强特征提取和关注机制,从而在复杂场景下提高检测性能。该模型在包括多个国家的道路损坏图像的全面数据集上进行训练,包括扩展的损坏类别。这种扩展的分类范围使得对道路状况的评估更加准确和现实。比较分析显示,YOLO9tr相对于最先进的YOLO8、YOLO9和YOLO10模型具有更卓越的精度和推理速度,实现计算效率与检测准确性的平衡。该模型达到高达136 FPS的高帧率,使其适用于实时应用,如视频监控和自动检测系统。研究还进行了一项消融研究,分析建筑修改和超参数变化对模型性能的影响,进一步验证了部分注意块的有效性。结果表明,YOLO9tr在实时道路沥青路面状况监测方面具有潜在应用价值,为保持安全、功能完备的道路基础设施发展做出了贡献。

URL

https://arxiv.org/abs/2406.11254

PDF

https://arxiv.org/pdf/2406.11254.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot