Abstract
Geometry- and appearance-controlled full-body human image generation is an interesting but challenging task. Existing solutions are either unconditional or dependent on coarse conditions (e.g., pose, text), thus lacking explicit geometry and appearance control of body and garment. Sketching offers such editing ability and has been adopted in various sketch-based face generation and editing solutions. However, directly adapting sketch-based face generation to full-body generation often fails to produce high-fidelity and diverse results due to the high complexity and diversity in the pose, body shape, and garment shape and texture. Recent geometrically controllable diffusion-based methods mainly rely on prompts to generate appearance and it is hard to balance the realism and the faithfulness of their results to the sketch when the input is coarse. This work presents Sketch2Human, the first system for controllable full-body human image generation guided by a semantic sketch (for geometry control) and a reference image (for appearance control). Our solution is based on the latent space of StyleGAN-Human with inverted geometry and appearance latent codes as input. Specifically, we present a sketch encoder trained with a large synthetic dataset sampled from StyleGAN-Human's latent space and directly supervised by sketches rather than real images. Considering the entangled information of partial geometry and texture in StyleGAN-Human and the absence of disentangled datasets, we design a novel training scheme that creates geometry-preserved and appearance-transferred training data to tune a generator to achieve disentangled geometry and appearance control. Although our method is trained with synthetic data, it can handle hand-drawn sketches as well. Qualitative and quantitative evaluations demonstrate the superior performance of our method to state-of-the-art methods.
Abstract (translated)
几何和外观控全身体像生成是一个有趣但具有挑战性的任务。现有的解决方案或依赖于粗略的条件(例如姿态,文本),从而缺乏对身体的几何和外观控制。绘画提供了这种编辑能力,并在各种基于草图的脸部生成和编辑解决方案中得到了应用。然而,直接将基于草图的脸部生成应用到全身生成通常由于姿态、身体形状和服装形状和纹理的高复杂性和多样性而无法产生高质量和多样性的结果。最近,基于几何可控制扩散的解决方案主要依赖于提示生成外观,而在输入粗糙时很难平衡现实感和结果的准确性。 这项工作提出了Sketch2Human,第一个基于语义草图的全身体人像生成系统(用于几何控制)和参考图像(用于外观控制)。我们的解决方案基于StyleGAN-Human的潜在空间,具有倒置的形状和外观潜在代码作为输入。具体来说,我们提出了一种用从StyleGAN-Human的潜在空间中采样的大规模合成数据训练的草图编码器,并直接从草图中监督生成图像。考虑到StyleGAN-Human中部分几何和纹理的纠缠信息以及缺乏解耦的数据集,我们设计了一种新的训练计划,以创建几何保留和外观传递的训练数据来调整生成器实现解耦几何和外观控制。尽管我们的方法是基于合成数据训练的,但它还可以处理手绘草图。定性和定量评估证明了我们的方法在现有方法中的卓越性能。
URL
https://arxiv.org/abs/2404.15889