Paper Reading AI Learner

Prototipo de un Contador Bidireccional Automático de Personas basado en sensores de visión 3D

2024-03-18 23:18:40
Benjamín Ojeda-Magaña, Rubén Ruelas, José Guadalupe Robledo-Hernández, Víctor Manuel Rangel-Cobián, Fernando López Aguilar-Hernández

Abstract

3D sensors, also known as RGB-D sensors, utilize depth images where each pixel measures the distance from the camera to objects, using principles like structured light or time-of-flight. Advances in artificial vision have led to affordable 3D cameras capable of real-time object detection without object movement, surpassing 2D cameras in information depth. These cameras can identify objects of varying colors and reflectivities and are less affected by lighting changes. The described prototype uses RGB-D sensors for bidirectional people counting in venues, aiding security and surveillance in spaces like stadiums or airports. It determines real-time occupancy and checks against maximum capacity, crucial during emergencies. The system includes a RealSense D415 depth camera and a mini-computer running object detection algorithms to count people and a 2D camera for identity verification. The system supports statistical analysis and uses C++, Python, and PHP with OpenCV for image processing, demonstrating a comprehensive approach to monitoring venue occupancy.

Abstract (translated)

3D传感器,也称为RGB-D传感器,利用深度图,其中每个像素测量相机到物体的距离,利用结构光或时间测距等原理。人工智能的进步使得价格实惠的3D相机能够实现实时物体检测,超过2D相机在信息深度方面的表现。这些相机可以识别各种颜色和反射率的物体,对光线变化的影响较小。描述的原型使用RGB-D传感器进行场馆双向人员计数,帮助体育场馆或机场等空间的安保和监控。它实时确定空位并检查最大容量,在紧急情况下至关重要。系统包括一个RealSense D415深度相机和一个运行物体检测算法的迷你计算机,以及一个2D相机用于身份验证。该系统支持统计分析,并使用C++、Python和PHP与OpenCV进行图像处理,展示了全面监测场馆占有率的方法。

URL

https://arxiv.org/abs/2403.12310

PDF

https://arxiv.org/pdf/2403.12310.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot