Paper Reading AI Learner

Real-time Object and Event Detection Service through Computer Vision and Edge Computing

2025-04-15 23:11:42
Marcos Mendes, Gon\c{c}alo Perna, Pedro Rito, Duarte Raposo, Susana Sargento

Abstract

The World Health Organization suggests that road traffic crashes cost approximately 518 billion dollars globally each year, which accounts for 3% of the gross domestic product for most countries. Most fatal road accidents in urban areas involve Vulnerable Road Users (VRUs). Smart cities environments present innovative approaches to combat accidents involving cutting-edge technologies, that include advanced sensors, extensive datasets, Machine Learning (ML) models, communication systems, and edge computing. This paper proposes a strategy and an implementation of a system for road monitoring and safety for smart cities, based on Computer Vision (CV) and edge computing. Promising results were obtained by implementing vision algorithms and tracking using surveillance cameras, that are part of a Smart City testbed, the Aveiro Tech City Living Lab (ATCLL). The algorithm accurately detects and tracks cars, pedestrians, and bicycles, while predicting the road state, the distance between moving objects, and inferring on collision events to prevent collisions, in near real-time.

Abstract (translated)

世界卫生组织建议,每年全球道路交通事故造成的经济损失约为5180亿美元,这占大多数国家国内生产总值的3%。在城市地区发生的大部分致命交通事故涉及弱势道路使用者(VRUs)。智慧城市的环境提供了一些创新的方法来应对利用尖端技术的道路事故,这些技术包括高级传感器、庞大的数据集、机器学习(ML)模型、通信系统以及边缘计算。本文提出了一种基于计算机视觉(CV)和边缘计算的智慧城市道路交通监控与安全系统的策略及其实现方案。通过在智慧城市的测试平台——阿威罗科技城生活实验室(ATCLL)中部署监视摄像头并实施视觉算法和跟踪,取得了令人鼓舞的结果。该算法能够准确地检测和追踪车辆、行人以及自行车,并实时预测道路状况、移动物体之间的距离,并推断碰撞事件以防止发生碰撞。

URL

https://arxiv.org/abs/2504.11662

PDF

https://arxiv.org/pdf/2504.11662.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot