Paper Reading AI Learner

Deep Learning Based Facial Retargeting Using Local Patches

2026-01-13 10:56:15
Yeonsoo Choi, Inyup Lee, Sihun Cha, Seonghyeon Kim, Sunjin Jung, Junyong Noh

Abstract

In the era of digital animation, the quest to produce lifelike facial animations for virtual characters has led to the development of various retargeting methods. While the retargeting facial motion between models of similar shapes has been very successful, challenges arise when the retargeting is performed on stylized or exaggerated 3D characters that deviate significantly from human facial structures. In this scenario, it is important to consider the target character's facial structure and possible range of motion to preserve the semantics assumed by the original facial motions after the retargeting. To achieve this, we propose a local patch-based retargeting method that transfers facial animations captured in a source performance video to a target stylized 3D character. Our method consists of three modules. The Automatic Patch Extraction Module extracts local patches from the source video frame. These patches are processed through the Reenactment Module to generate correspondingly re-enacted target local patches. The Weight Estimation Module calculates the animation parameters for the target character at every frame for the creation of a complete facial animation sequence. Extensive experiments demonstrate that our method can successfully transfer the semantic meaning of source facial expressions to stylized characters with considerable variations in facial feature proportion.

Abstract (translated)

在数字动画时代,为了为虚拟角色生成逼真的面部动画,开发了各种重定向方法。虽然在形状相似的模型之间进行面部运动重定位已经非常成功,但在对风格化或夸张的3D角色(这些角色与人类面部结构有显著差异)进行重定位时会遇到挑战。在这种情况下,重要的是要考虑到目标角色的面部结构及其可能的活动范围,以确保原始面部动作所假设的意义在重定位后得到保留。 为此,我们提出了一种基于局部补丁的重定位方法,该方法将从源表演视频中捕捉到的面部动画转移到目标风格化的3D角色上。我们的方法由三个模块组成:自动补丁提取模块从源视频帧中抽取局部补丁;重新表现模块处理这些补丁以生成相应的目标局部补丁;权重估计模块计算每帧的目标字符动画参数,从而创建完整的面部动画序列。 通过广泛的实验表明,该方法能够成功地将源面部表情的语义意义转移到具有显著面部特征比例变化的风格化角色上。

URL

https://arxiv.org/abs/2601.08429

PDF

https://arxiv.org/pdf/2601.08429.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot