Abstract
Video anomaly detection (VAD) is an important but challenging task in computer vision. The main challenge rises due to the rarity of training samples to model all anomaly cases. Hence, semi-supervised anomaly detection methods have gotten more attention, since they focus on modeling normals and they detect anomalies by measuring the deviations from normal patterns. Despite impressive advances of these methods in modeling normal motion and appearance, long-term motion modeling has not been effectively explored so far. Inspired by the abilities of the future frame prediction proxy-task, we introduce the task of future video prediction from a single frame, as a novel proxy-task for video anomaly detection. This proxy-task alleviates the challenges of previous methods in learning longer motion patterns. Moreover, we replace the initial and future raw frames with their corresponding semantic segmentation map, which not only makes the method aware of object class but also makes the prediction task less complex for the model. Extensive experiments on the benchmark datasets (ShanghaiTech, UCSD-Ped1, and UCSD-Ped2) show the effectiveness of the method and the superiority of its performance compared to SOTA prediction-based VAD methods.
Abstract (translated)
视频异常检测(VAD)是计算机视觉中一个重要的但具有挑战性的任务。其主要挑战源于训练样本不足以模型所有异常案例的罕见性。因此,半监督异常检测方法越来越受到关注,因为它们专注于建模正常情况,并通过测量与正常模式 Deviations 的差异来检测异常。尽管这些方法在建模正常运动和外观方面取得了令人印象深刻的进步,但到目前为止,长期运动建模还没有得到 effectively 的探索。受到未来帧预测代理任务的能力启发,我们引入了从单个帧预测未来视频的任务,并将其作为视频异常检测中的新型代理任务。这个代理任务可以减轻以前方法在学习更长的运动模式方面的挑战。此外,我们替换了初始和未来的 raw 帧及其相应的语义分割地图,这不仅使方法能够识别物体类别,还使模型的预测任务变得更加简单。在基准数据集(ShanghaiTech、UCCSD-Ped1 和 UCSD-Ped2)上进行广泛的实验表明,这种方法的有效性和与 SOTA 基于预测的 VAD 方法相比的性能优越性。
URL
https://arxiv.org/abs/2308.07783