Abstract
Human pose estimation (HPE) has become essential in numerous applications including healthcare, activity recognition, and human-computer interaction. However, the privacy implications of processing sensitive visual data present significant deployment barriers in critical domains. While traditional anonymization techniques offer limited protection and often compromise data utility for broader motion analysis, Differential Privacy (DP) provides formal privacy guarantees but typically degrades model performance when applied naively. In this work, we present the first differentially private 2D human pose estimation (2D-HPE) by applying Differentially Private Stochastic Gradient Descent (DP-SGD) to this task. To effectively balance privacy with performance, we adopt Projected DP-SGD (PDP-SGD), which projects the noisy gradients to a low-dimensional subspace. Additionally, we adapt TinyViT, a compact and efficient vision transformer for coordinate classification in HPE, providing a lightweight yet powerful backbone that enhances privacy-preserving deployment feasibility on resource-limited devices. Our approach is particularly valuable for multimedia interpretation tasks, enabling privacy-safe analysis and understanding of human motion across diverse visual media while preserving the semantic meaning required for downstream applications. Comprehensive experiments on the MPII Human Pose Dataset demonstrate significant performance enhancement with PDP-SGD achieving 78.48% PCKh@0.5 at a strict privacy budget ($\epsilon=0.2$), compared to 63.85% for standard DP-SGD. This work lays foundation for privacy-preserving human pose estimation in real-world, sensitive applications.
Abstract (translated)
人体姿态估计(HPE)在医疗保健、活动识别和人机交互等多个领域变得至关重要。然而,处理敏感视觉数据的隐私问题构成了关键应用领域的重大部署障碍。传统的匿名化技术提供的保护有限,并且常常会损害用于广泛运动分析的数据效用。相比之下,差分隐私(DP)提供了正式的隐私保证,但当直接应用于模型训练时通常会导致性能下降。在这项工作中,我们首次提出了一个具有差分隐私保障的二维人体姿态估计方法(2D-HPE),通过将差分私有随机梯度下降法(DP-SGD)应用到这个任务中来实现。为了有效地平衡隐私与性能之间的关系,我们采用了投影式差分私有随机梯度下降法(PDP-SGD),这种方法将带有噪声的梯度投影到了一个低维子空间内。 此外,我们将TinyViT这种紧凑且高效的视觉变换器应用到坐标分类中,并将其用于人体姿态估计,提供了一个轻量级但功能强大的骨干网络,增强了资源受限设备上隐私保护部署的可能性。我们的方法对于多媒体解读任务特别有价值,它能够在不损害下游应用程序所需语义意义的情况下,在各种视觉媒体中实现对人类运动的隐私安全分析与理解。 在MPII人体姿态数据集上的全面实验表明,使用PDP-SGD的方法取得了显著的性能提升,当严格的隐私预算为$\epsilon=0.2$时,实现了78.48%的PCKh@0.5指标,而标准DP-SGD仅达到63.85%。这项工作为现实世界中敏感应用中的差分私有人体姿态估计奠定了基础。
URL
https://arxiv.org/abs/2504.10190