Abstract
Video anomaly detection (VAD) is crucial in scenarios such as surveillance and autonomous driving, where timely detection of unexpected activities is essential. Although existing methods have primarily focused on detecting anomalous objects in videos -- either by identifying anomalous frames or objects -- they often neglect finer-grained analysis, such as anomalous pixels, which limits their ability to capture a broader range of anomalies. To address this challenge, we propose a new framework called Track Any Anomalous Object (TAO), which introduces a granular video anomaly detection pipeline that, for the first time, integrates the detection of multiple fine-grained anomalous objects into a unified framework. Unlike methods that assign anomaly scores to every pixel, our approach transforms the problem into pixel-level tracking of anomalous objects. By linking anomaly scores to downstream tasks such as segmentation and tracking, our method removes the need for threshold tuning and achieves more precise anomaly localization in long and complex video sequences. Experiments demonstrate that TAO sets new benchmarks in accuracy and robustness. Project page available online.
Abstract (translated)
视频异常检测(VAD)在监控和自动驾驶等场景中至关重要,及时发现意外活动是必不可少的。尽管现有的方法主要集中在识别视频中的异常对象——无论是通过找出异常帧还是异常物体——它们往往忽略了更细粒度的分析,如异常像素,这限制了其捕捉更广泛异常的能力。为了解决这一挑战,我们提出了一种新的框架,称为“追踪任何异常对象”(TAO),它引入了一个颗粒化的视频异常检测管道,首次将多个细粒度异常物体的检测整合到统一的框架中。与那些对每个像素分配异常分数的方法不同,我们的方法将问题转化为对异常对象进行像素级别的跟踪。通过将异常分数链接到下游任务如分割和追踪,我们的方法消除了调整阈值的需求,并在长且复杂的视频序列中实现了更精确的异常定位。实验表明,TAO在准确性和鲁棒性方面设立了新的基准。项目页面在线可用。
URL
https://arxiv.org/abs/2506.05175