Paper Reading AI Learner

Robustness Evaluation in Hand Pose Estimation Models using Metamorphic Testing

2023-03-08 13:23:53
Muxin Pu, Chun Yong Chong, Mei Kuan Lim

Abstract

Hand pose estimation (HPE) is a task that predicts and describes the hand poses from images or video frames. When HPE models estimate hand poses captured in a laboratory or under controlled environments, they normally deliver good performance. However, the real-world environment is complex, and various uncertainties may happen, which could degrade the performance of HPE models. For example, the hands could be occluded, the visibility of hands could be reduced by imperfect exposure rate, and the contour of hands prone to be blurred during fast hand movements. In this work, we adopt metamorphic testing to evaluate the robustness of HPE models and provide suggestions on the choice of HPE models for different applications. The robustness evaluation was conducted on four state-of-the-art models, namely MediaPipe hands, OpenPose, BodyHands, and NSRM hand. We found that on average more than 80\% of the hands could not be identified by BodyHands, and at least 50\% of hands could not be identified by MediaPipe hands when diagonal motion blur is introduced, while an average of more than 50\% of strongly underexposed hands could not be correctly estimated by NSRM hand. Similarly, applying occlusions on only four hand joints will also largely degrade the performance of these models. The experimental results show that occlusions, illumination variations, and motion blur are the main obstacles to the performance of existing HPE models. These findings may pave the way for researchers to improve the performance and robustness of hand pose estimation models and their applications.

Abstract (translated)

手姿态估计(HPE)是一个从图像或视频帧中预测和描述手姿态的任务。当HPE模型估计在实验室或受控环境中捕获的手姿态时,通常能够表现出色。然而,现实世界是复杂的,各种不确定性可能会发生,这可能会削弱HPE模型的性能。例如, hands 可能会被 occlusion, hands 的可见性可能因为不完美的曝光率而降低,而且手的轮廓在快速手移动时可能会变得模糊。在本文中,我们采用变形测试来评估HPE模型的鲁棒性,并为不同应用程序选择 HPE 模型的建议。鲁棒性评估是针对四个最先进的模型进行的,包括 MediaPipe hands、OpenPose、BodyHands 和 NSRM hand。我们发现,平均来说,超过 80% 的手无法通过 BodyHands 识别,当对角运动模糊引入时,超过 50% 的手无法通过 MediaPipe hands 正确识别,而 strongly under exposed hands 的平均超过 50% 无法通过 NSRM hand 准确地估计。类似地,仅应用 occlusions 在每个手关节上也会极大地削弱这些模型的性能。实验结果显示, occlusion、照明变化和运动模糊是现有 HPE 模型性能的主要障碍。这些发现可能为研究人员改进手姿态估计模型及其应用的性能铺平了道路。

URL

https://arxiv.org/abs/2303.04566

PDF

https://arxiv.org/pdf/2303.04566.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot