A Computer Vision Based Approach for Stalking Detection Using a CNN-LSTM-MLP Hybrid Fusion Model

Abstract
Abstract (translated)
URL
PDF

Abstract

Criminal and suspicious activity detection has become a popular research topic in recent years. The rapid growth of computer vision technologies has had a crucial impact on solving this issue. However, physical stalking detection is still a less explored area despite the evolution of modern technology. Nowadays, stalking in public places has become a common occurrence with women being the most affected. Stalking is a visible action that usually occurs before any criminal activity begins as the stalker begins to follow, loiter, and stare at the victim before committing any criminal activity such as assault, kidnapping, rape, and so on. Therefore, it has become a necessity to detect stalking as all of these criminal activities can be stopped in the first place through stalking detection. In this research, we propose a novel deep learning-based hybrid fusion model to detect potential stalkers from a single video with a minimal number of frames. We extract multiple relevant features, such as facial landmarks, head pose estimation, and relative distance, as numerical values from video frames. This data is fed into a multilayer perceptron (MLP) to perform a classification task between a stalking and a non-stalking scenario. Simultaneously, the video frames are fed into a combination of convolutional and LSTM models to extract the spatio-temporal features. We use a fusion of these numerical and spatio-temporal features to build a classifier to detect stalking incidents. Additionally, we introduce a dataset consisting of stalking and non-stalking videos gathered from various feature films and television series, which is also used to train the model. The experimental results show the efficiency and dynamism of our proposed stalker detection system, achieving 89.58% testing accuracy with a significant improvement as compared to the state-of-the-art approaches.

Abstract (translated)

近年来，犯罪和可疑活动检测已成为一个热门的研究课题。计算机视觉技术的快速发展对解决这个问题起到了关键作用。然而，尽管现代技术的进步，物理跟踪检测仍然是一个相对较少研究的领域。如今，公共场所的跟踪已经成为女性最常遭受的侵犯行为。跟踪是一种明显的行动，通常在犯罪活动开始前就发生了，跟踪者开始跟随、窥视和盯着受害者，然后实施诸如攻击、绑架、强奸等犯罪活动。因此，通过跟踪检测这些犯罪活动已成为不可避免的必要性。在这个研究中，我们提出了一个基于深度学习的混合模型，用于从单个视频检测潜在跟踪者，且帧数最小。我们提取了视频帧中的多个相关特征，如面部特征点、头姿态估计和相对距离，并将其转换为数值值。将这个数据输入到多层感知器（MLP）中进行分类任务，将跟踪和非跟踪场景区分开来。同时，将视频帧输入到卷积和LSTM模型中提取空间和时间特征。我们使用这些数值和空间时间特征的融合来构建一个分类器，用于检测跟踪事件。此外，我们还引入了一个由各种电影和电视剧收集的跟踪和非跟踪视频组成的训练数据集，用于训练模型。实验结果表明，我们提出的跟踪检测系统具有高效率和动态性，在89.58%的测试准确率方面取得了显著的改善，与最先进的跟踪检测方法相比。

URL

https://arxiv.org/abs/2402.03417

PDF

https://arxiv.org/pdf/2402.03417.pdf

A Computer Vision Based Approach for Stalking Detection Using a CNN-LSTM-MLP Hybrid Fusion Model

Abstract

Abstract (translated)

URL

PDF Copy

PDF