Paper Reading AI Learner

Differentially Private 2D Human Pose Estimation

2025-04-14 12:50:37
Kaushik Bhargav Sivangi, Idris Zakariyya, Paul Henderson, Fani Deligianni

Abstract

Human pose estimation (HPE) has become essential in numerous applications including healthcare, activity recognition, and human-computer interaction. However, the privacy implications of processing sensitive visual data present significant deployment barriers in critical domains. While traditional anonymization techniques offer limited protection and often compromise data utility for broader motion analysis, Differential Privacy (DP) provides formal privacy guarantees but typically degrades model performance when applied naively. In this work, we present the first differentially private 2D human pose estimation (2D-HPE) by applying Differentially Private Stochastic Gradient Descent (DP-SGD) to this task. To effectively balance privacy with performance, we adopt Projected DP-SGD (PDP-SGD), which projects the noisy gradients to a low-dimensional subspace. Additionally, we adapt TinyViT, a compact and efficient vision transformer for coordinate classification in HPE, providing a lightweight yet powerful backbone that enhances privacy-preserving deployment feasibility on resource-limited devices. Our approach is particularly valuable for multimedia interpretation tasks, enabling privacy-safe analysis and understanding of human motion across diverse visual media while preserving the semantic meaning required for downstream applications. Comprehensive experiments on the MPII Human Pose Dataset demonstrate significant performance enhancement with PDP-SGD achieving 78.48% PCKh@0.5 at a strict privacy budget ($\epsilon=0.2$), compared to 63.85% for standard DP-SGD. This work lays foundation for privacy-preserving human pose estimation in real-world, sensitive applications.

Abstract (translated)

人体姿态估计(HPE)在医疗保健、活动识别和人机交互等多个领域变得至关重要。然而,处理敏感视觉数据的隐私问题构成了关键应用领域的重大部署障碍。传统的匿名化技术提供的保护有限,并且常常会损害用于广泛运动分析的数据效用。相比之下,差分隐私(DP)提供了正式的隐私保证,但当直接应用于模型训练时通常会导致性能下降。在这项工作中,我们首次提出了一个具有差分隐私保障的二维人体姿态估计方法(2D-HPE),通过将差分私有随机梯度下降法(DP-SGD)应用到这个任务中来实现。为了有效地平衡隐私与性能之间的关系,我们采用了投影式差分私有随机梯度下降法(PDP-SGD),这种方法将带有噪声的梯度投影到了一个低维子空间内。 此外,我们将TinyViT这种紧凑且高效的视觉变换器应用到坐标分类中,并将其用于人体姿态估计,提供了一个轻量级但功能强大的骨干网络,增强了资源受限设备上隐私保护部署的可能性。我们的方法对于多媒体解读任务特别有价值,它能够在不损害下游应用程序所需语义意义的情况下,在各种视觉媒体中实现对人类运动的隐私安全分析与理解。 在MPII人体姿态数据集上的全面实验表明,使用PDP-SGD的方法取得了显著的性能提升,当严格的隐私预算为$\epsilon=0.2$时,实现了78.48%的PCKh@0.5指标,而标准DP-SGD仅达到63.85%。这项工作为现实世界中敏感应用中的差分私有人体姿态估计奠定了基础。

URL

https://arxiv.org/abs/2504.10190

PDF

https://arxiv.org/pdf/2504.10190.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot