Gait Recognition from Highly Compressed Videos

Abstract
Abstract (translated)
URL
PDF

Abstract

Surveillance footage represents a valuable resource and opportunities for conducting gait analysis. However, the typical low quality and high noise levels in such footage can severely impact the accuracy of pose estimation algorithms, which are foundational for reliable gait analysis. Existing literature suggests a direct correlation between the efficacy of pose estimation and the subsequent gait analysis results. A common mitigation strategy involves fine-tuning pose estimation models on noisy data to improve robustness. However, this approach may degrade the downstream model's performance on the original high-quality data, leading to a trade-off that is undesirable in practice. We propose a processing pipeline that incorporates a task-targeted artifact correction model specifically designed to pre-process and enhance surveillance footage before pose estimation. Our artifact correction model is optimized to work alongside a state-of-the-art pose estimation network, HRNet, without requiring repeated fine-tuning of the pose estimation model. Furthermore, we propose a simple and robust method for obtaining low quality videos that are annotated with poses in an automatic manner with the purpose of training the artifact correction model. We systematically evaluate the performance of our artifact correction model against a range of noisy surveillance data and demonstrate that our approach not only achieves improved pose estimation on low-quality surveillance footage, but also preserves the integrity of the pose estimation on high resolution footage. Our experiments show a clear enhancement in gait analysis performance, supporting the viability of the proposed method as a superior alternative to direct fine-tuning strategies. Our contributions pave the way for more reliable gait analysis using surveillance data in real-world applications, regardless of data quality.

Abstract (translated)

监视视频资料是一种宝贵的资源和进行姿态分析的机会。然而，这类视频的低质量和高噪声水平可能会严重影响姿态估计算法的准确性，这些算法是可靠姿态分析的基础。现有文献表明，姿态估计的有效性与后续的姿态分析结果之间存在直接关系。一种常见的缓解策略是在噪声数据上对姿态估计模型进行微调，以提高稳健性。然而，这种方法可能会在原始高质量数据上降低下游模型的性能，导致在实践中不必要的权衡。我们提出了一个处理流程，其中包含一个专门针对任务目标进行预处理和增强的监视视频处理模型。我们的预处理和增强模型与最先进的姿态估计网络——HRNet——协同工作，无需反复微调姿态估计模型。此外，我们提出了一种简单而鲁棒的方法，用于自动标注带有姿态的低质量视频，以训练预处理和增强模型。我们系统地评估了我们的预处理模型的性能，并证明我们的方法不仅能在低质量监视视频上实现 improved pose estimation，还能在高质量视频上保留姿态估计的完整性。我们的实验显示，我们的预处理模型在姿态分析性能上明显增强，支持了所提出的利用监视数据进行更可靠姿态分析作为直接微调策略的替代品。我们的贡献为使用监视数据进行更可靠姿态分析在现实应用中铺平道路，而无需考虑数据质量。

URL

https://arxiv.org/abs/2404.12183

PDF

https://arxiv.org/pdf/2404.12183.pdf

Gait Recognition from Highly Compressed Videos

Abstract

Abstract (translated)

URL

PDF Copy

PDF