Paper Reading AI Learner

ReDirTrans: Latent-to-Latent Translation for Gaze and Head Redirection

2023-05-19 06:13:26
Shiwei Jin, Zhen Wang, Lei Wang, Ning Bi, Truong Nguyen

Abstract

Learning-based gaze estimation methods require large amounts of training data with accurate gaze annotations. Facing such demanding requirements of gaze data collection and annotation, several image synthesis methods were proposed, which successfully redirected gaze directions precisely given the assigned conditions. However, these methods focused on changing gaze directions of the images that only include eyes or restricted ranges of faces with low resolution (less than $128\times128$) to largely reduce interference from other attributes such as hairs, which limits application scenarios. To cope with this limitation, we proposed a portable network, called ReDirTrans, achieving latent-to-latent translation for redirecting gaze directions and head orientations in an interpretable manner. ReDirTrans projects input latent vectors into aimed-attribute embeddings only and redirects these embeddings with assigned pitch and yaw values. Then both the initial and edited embeddings are projected back (deprojected) to the initial latent space as residuals to modify the input latent vectors by subtraction and addition, representing old status removal and new status addition. The projection of aimed attributes only and subtraction-addition operations for status replacement essentially mitigate impacts on other attributes and the distribution of latent vectors. Thus, by combining ReDirTrans with a pretrained fixed e4e-StyleGAN pair, we created ReDirTrans-GAN, which enables accurately redirecting gaze in full-face images with $1024\times1024$ resolution while preserving other attributes such as identity, expression, and hairstyle. Furthermore, we presented improvements for the downstream learning-based gaze estimation task, using redirected samples as dataset augmentation.

Abstract (translated)

基于学习的 gaze 估计方法需要大量准确的 gaze 标注数据。为了满足 gaze 数据收集和标注的严格要求,提出了几种图像合成方法,这些方法成功地改变了仅包含眼睛或限制范围较大的面部图像的 gaze 方向,以尽量减少与其他属性如头发等的干扰,从而限制了应用场景。为了应对这一限制,我们提出了一种便携式网络,称为 ReDirTrans,可以实现隐态到隐态的翻译,以可解释的方式 redirect gaze 方向和头向, ReDirTrans 只将输入隐态向量映射到目标属性嵌入向量,并使用指定的 pitch 和 yaw 值 redirect 这些向量。然后,初始和编辑的嵌入向量都被投影回初始隐空间,作为残留值,通过减去和加法修改输入隐向量,表示删除旧状态并添加新状态。仅投影目标属性和用于状态替换的减去加法操作实际上减缓了对其他属性和隐向量分布的影响。因此,通过结合预训练的固定 e4e-StyleGAN 对组合,我们创造了 ReDirTrans-GAN,该网络可以在 $1024\times1024$ 分辨率的全景图像中准确地 redirect gaze,同时保留其他属性,如身份、表现和发型。此外,我们还提出了基于下游学习 gaze 估计任务的进步,使用重新引导样本作为数据增强。

URL

https://arxiv.org/abs/2305.11452

PDF

https://arxiv.org/pdf/2305.11452.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot