ReDirTrans: Latent-to-Latent Translation for Gaze and Head Redirection

Abstract
Abstract (translated)
URL
PDF

Abstract

Learning-based gaze estimation methods require large amounts of training data with accurate gaze annotations. Facing such demanding requirements of gaze data collection and annotation, several image synthesis methods were proposed, which successfully redirected gaze directions precisely given the assigned conditions. However, these methods focused on changing gaze directions of the images that only include eyes or restricted ranges of faces with low resolution (less than $128\times128$) to largely reduce interference from other attributes such as hairs, which limits application scenarios. To cope with this limitation, we proposed a portable network, called ReDirTrans, achieving latent-to-latent translation for redirecting gaze directions and head orientations in an interpretable manner. ReDirTrans projects input latent vectors into aimed-attribute embeddings only and redirects these embeddings with assigned pitch and yaw values. Then both the initial and edited embeddings are projected back (deprojected) to the initial latent space as residuals to modify the input latent vectors by subtraction and addition, representing old status removal and new status addition. The projection of aimed attributes only and subtraction-addition operations for status replacement essentially mitigate impacts on other attributes and the distribution of latent vectors. Thus, by combining ReDirTrans with a pretrained fixed e4e-StyleGAN pair, we created ReDirTrans-GAN, which enables accurately redirecting gaze in full-face images with $1024\times1024$ resolution while preserving other attributes such as identity, expression, and hairstyle. Furthermore, we presented improvements for the downstream learning-based gaze estimation task, using redirected samples as dataset augmentation.

Abstract (translated)

基于学习的 gaze 估计方法需要大量准确的 gaze 标注数据。为了满足 gaze 数据收集和标注的严格要求，提出了几种图像合成方法，这些方法成功地改变了仅包含眼睛或限制范围较大的面部图像的 gaze 方向，以尽量减少与其他属性如头发等的干扰，从而限制了应用场景。为了应对这一限制，我们提出了一种便携式网络，称为 ReDirTrans，可以实现隐态到隐态的翻译，以可解释的方式 redirect gaze 方向和头向， ReDirTrans 只将输入隐态向量映射到目标属性嵌入向量，并使用指定的 pitch 和 yaw 值 redirect 这些向量。然后，初始和编辑的嵌入向量都被投影回初始隐空间，作为残留值，通过减去和加法修改输入隐向量，表示删除旧状态并添加新状态。仅投影目标属性和用于状态替换的减去加法操作实际上减缓了对其他属性和隐向量分布的影响。因此，通过结合预训练的固定 e4e-StyleGAN 对组合，我们创造了 ReDirTrans-GAN，该网络可以在 $1024\times1024$ 分辨率的全景图像中准确地 redirect gaze，同时保留其他属性，如身份、表现和发型。此外，我们还提出了基于下游学习 gaze 估计任务的进步，使用重新引导样本作为数据增强。

URL

https://arxiv.org/abs/2305.11452

PDF

https://arxiv.org/pdf/2305.11452.pdf

ReDirTrans: Latent-to-Latent Translation for Gaze and Head Redirection

Abstract

Abstract (translated)

URL

PDF Copy

PDF