Paper Reading AI Learner

Detecting Violence in Video Based on Deep Features Fusion Technique

2022-04-15 12:51:20
Heyam M. Bin Jahlan, Lamiaa A. Elrefaei

Abstract

With the rapid growth of surveillance cameras in many public places to mon-itor human activities such as in malls, streets, schools and, prisons, there is a strong demand for such systems to detect violence events automatically. Au-tomatic analysis of video to detect violence is significant for law enforce-ment. Moreover, it helps to avoid any social, economic and environmental damages. Mostly, all systems today require manual human supervisors to de-tect violence scenes in the video which is inefficient and inaccurate. in this work, we interest in physical violence that involved two persons or more. This work proposed a novel method to detect violence using a fusion tech-nique of two significantly different convolutional neural networks (CNNs) which are AlexNet and SqueezeNet networks. Each network followed by separate Convolution Long Short Term memory (ConvLSTM) to extract ro-bust and richer features from a video in the final hidden state. Then, making a fusion of these two obtained states and fed to the max-pooling layer. Final-ly, features were classified using a series of fully connected layers and soft-max classifier. The performance of the proposed method is evaluated using three standard benchmark datasets in terms of detection accuracy: Hockey Fight dataset, Movie dataset and Violent Flow dataset. The results show an accuracy of 97%, 100%, and 96% respectively. A comparison of the results with the state of the art techniques revealed the promising capability of the proposed method in recognizing violent videos.

Abstract (translated)

URL

https://arxiv.org/abs/2204.07443

PDF

https://arxiv.org/pdf/2204.07443.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot