Paper Reading AI Learner

A Computer Vision Based Approach for Stalking Detection Using a CNN-LSTM-MLP Hybrid Fusion Model

2024-02-05 18:53:54
Murad Hasan, Shahriar Iqbal, Md. Billal Hossain Faisal, Md. Musnad Hossin Neloy, Md. Tonmoy Kabir, Md. Tanzim Reza, Md. Golam Rabiul Alam, Md Zia Uddin

Abstract

Criminal and suspicious activity detection has become a popular research topic in recent years. The rapid growth of computer vision technologies has had a crucial impact on solving this issue. However, physical stalking detection is still a less explored area despite the evolution of modern technology. Nowadays, stalking in public places has become a common occurrence with women being the most affected. Stalking is a visible action that usually occurs before any criminal activity begins as the stalker begins to follow, loiter, and stare at the victim before committing any criminal activity such as assault, kidnapping, rape, and so on. Therefore, it has become a necessity to detect stalking as all of these criminal activities can be stopped in the first place through stalking detection. In this research, we propose a novel deep learning-based hybrid fusion model to detect potential stalkers from a single video with a minimal number of frames. We extract multiple relevant features, such as facial landmarks, head pose estimation, and relative distance, as numerical values from video frames. This data is fed into a multilayer perceptron (MLP) to perform a classification task between a stalking and a non-stalking scenario. Simultaneously, the video frames are fed into a combination of convolutional and LSTM models to extract the spatio-temporal features. We use a fusion of these numerical and spatio-temporal features to build a classifier to detect stalking incidents. Additionally, we introduce a dataset consisting of stalking and non-stalking videos gathered from various feature films and television series, which is also used to train the model. The experimental results show the efficiency and dynamism of our proposed stalker detection system, achieving 89.58% testing accuracy with a significant improvement as compared to the state-of-the-art approaches.

Abstract (translated)

近年来,犯罪和可疑活动检测已成为一个热门的研究课题。计算机视觉技术的快速发展对解决这个问题起到了关键作用。然而,尽管现代技术的进步,物理跟踪检测仍然是一个相对较少研究的领域。如今,公共场所的跟踪已经成为女性最常遭受的侵犯行为。跟踪是一种明显的行动,通常在犯罪活动开始前就发生了,跟踪者开始跟随、窥视和盯着受害者,然后实施诸如攻击、绑架、强奸等犯罪活动。因此,通过跟踪检测这些犯罪活动已成为不可避免的必要性。在这个研究中,我们提出了一个基于深度学习的混合模型,用于从单个视频检测潜在跟踪者,且帧数最小。我们提取了视频帧中的多个相关特征,如面部特征点、头姿态估计和相对距离,并将其转换为数值值。将这个数据输入到多层感知器(MLP)中进行分类任务,将跟踪和非跟踪场景区分开来。同时,将视频帧输入到卷积和LSTM模型中提取空间和时间特征。我们使用这些数值和空间时间特征的融合来构建一个分类器,用于检测跟踪事件。此外,我们还引入了一个由各种电影和电视剧收集的跟踪和非跟踪视频组成的训练数据集,用于训练模型。实验结果表明,我们提出的跟踪检测系统具有高效率和动态性,在89.58%的测试准确率方面取得了显著的改善,与最先进的跟踪检测方法相比。

URL

https://arxiv.org/abs/2402.03417

PDF

https://arxiv.org/pdf/2402.03417.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot