Paper Reading AI Learner

Evaluation of facial landmark localization performance in a surgical setting

2025-07-24 09:40:47
Ines Frajtag, Marko \v{S}vaco, Filip \v{S}uligoj

Abstract

The use of robotics, computer vision, and their applications is becoming increasingly widespread in various fields, including medicine. Many face detection algorithms have found applications in neurosurgery, ophthalmology, and plastic surgery. A common challenge in using these algorithms is variable lighting conditions and the flexibility of detection positions to identify and precisely localize patients. The proposed experiment tests the MediaPipe algorithm for detecting facial landmarks in a controlled setting, using a robotic arm that automatically adjusts positions while the surgical light and the phantom remain in a fixed position. The results of this study demonstrate that the improved accuracy of facial landmark detection under surgical lighting significantly enhances the detection performance at larger yaw and pitch angles. The increase in standard deviation/dispersion occurs due to imprecise detection of selected facial landmarks. This analysis allows for a discussion on the potential integration of the MediaPipe algorithm into medical procedures.

Abstract (translated)

机器人技术、计算机视觉及其应用在包括医学在内的各个领域中越来越广泛。许多面部检测算法已经在神经外科、眼科和整形外科等领域找到了应用。使用这些算法的一个常见挑战是变化的照明条件以及为准确识别和定位患者而灵活调整检测位置的需求。本次实验在一个受控环境中测试了MediaPipe算法,用于在手术灯保持固定且模型头部静止的情况下,由机械臂自动调节不同角度来检测面部特征点。 该研究结果表明,在手术照明条件下改进后的面部特征点检测精度显著提高了偏航角(左右旋转)和俯仰角(上下倾斜)较大时的检测性能。标准偏差/离散度的增加是由于对选定面部特征点识别的不精确性所致。这项分析为MediaPipe算法在医疗程序中的潜在整合提供了讨论基础。

URL

https://arxiv.org/abs/2507.18248

PDF

https://arxiv.org/pdf/2507.18248.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot