Paper Reading AI Learner

Towards mitigating uncannness in face swaps via gaze-centric loss terms

2024-02-05 16:53:54
Ethan Wilson Frederick Shic Sophie J\"org Eakta Jain

Abstract

Advances in face swapping have enabled the automatic generation of highly realistic faces. Yet face swaps are perceived differently than when looking at real faces, with key differences in viewer behavior surrounding the eyes. Face swapping algorithms generally place no emphasis on the eyes, relying on pixel or feature matching losses that consider the entire face to guide the training process. We further investigate viewer perception of face swaps, focusing our analysis on the presence of an uncanny valley effect. We additionally propose a novel loss equation for the training of face swapping models, leveraging a pretrained gaze estimation network to directly improve representation of the eyes. We confirm that viewed face swaps do elicit uncanny responses from viewers. Our proposed improvements significant reduce viewing angle errors between face swaps and their source material. Our method additionally reduces the prevalence of the eyes as a deciding factor when viewers perform deepfake detection tasks. Our findings have implications on face swapping for special effects, as digital avatars, as privacy mechanisms, and more; negative responses from users could limit effectiveness in said applications. Our gaze improvements are a first step towards alleviating negative viewer perceptions via a targeted approach.

Abstract (translated)

面部换脸技术的进步使得高度逼真的脸部生成成为可能。然而,与观察真实人脸时不同,面部换脸在观众行为方面存在不同的看法,主要是眼睛周围的观感差异。面部换脸算法通常没有强调眼睛,依赖像素或特征匹配损失,这些损失考虑整个面部来指导训练过程。我们进一步研究了观众对面部换脸的感知,重点关注是否存在令人不安的谷地效应。我们提出了一个新的损失方程来训练面部换脸模型,利用预训练的眼部估计网络直接提高眼睛的表示。我们证实,观众看到的面部换脸会引发观众的不安反应。我们提出的改进措施显著减少了面部换脸和其源材料之间的视角误差。我们的方法还减少了观众在进行深度伪造检测任务时作为决定性因素的眼球普遍性。我们的研究结果对面部换脸、特效、隐私机制等领域的应用具有影响。用户负面反应可能限制这些应用的有效性。我们的眼部改进是减轻观众不良感知的第一步,通过一种有针对性的方法。

URL

https://arxiv.org/abs/2402.03188

PDF

https://arxiv.org/pdf/2402.03188.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot