Abstract
In this paper we introduce CUE-Net, a novel architecture designed for automated violence detection in video surveillance. As surveillance systems become more prevalent due to technological advances and decreasing costs, the challenge of efficiently monitoring vast amounts of video data has intensified. CUE-Net addresses this challenge by combining spatial Cropping with an enhanced version of the UniformerV2 architecture, integrating convolutional and self-attention mechanisms alongside a novel Modified Efficient Additive Attention mechanism (which reduces the quadratic time complexity of self-attention) to effectively and efficiently identify violent activities. This approach aims to overcome traditional challenges such as capturing distant or partially obscured subjects within video frames. By focusing on both local and global spatiotemporal features, CUE-Net achieves state-of-the-art performance on the RWF-2000 and RLVS datasets, surpassing existing methods.
Abstract (translated)
在本文中,我们提出了CUE-Net,一种专为视频监控自动暴力检测而设计的全新架构。由于技术的进步和成本降低,视频监控系统的普及程度不断加剧。CUE-Net通过将空间裁剪与增强版的UniformerV2架构相结合,引入卷积和自注意力机制,并采用一种新颖的修改后的有效加权注意机制(该机制减少了自注意力的二次时间复杂度)来有效地和高效地识别暴力活动。这种方法旨在克服传统挑战,如在视频中捕捉距离或部分遮挡的目标。通过关注局部和全局的时间特征,CUE-Net在RWF-2000和RLVS数据集上实现了最先进的性能,超过了现有方法。
URL
https://arxiv.org/abs/2404.18952