Paper Reading AI Learner

Sketch2Human: Deep Human Generation with Disentangled Geometry and Appearance Control

2024-04-24 14:24:57
Linzi Qu, Jiaxiang Shang, Hui Ye, Xiaoguang Han, Hongbo Fu

Abstract

Geometry- and appearance-controlled full-body human image generation is an interesting but challenging task. Existing solutions are either unconditional or dependent on coarse conditions (e.g., pose, text), thus lacking explicit geometry and appearance control of body and garment. Sketching offers such editing ability and has been adopted in various sketch-based face generation and editing solutions. However, directly adapting sketch-based face generation to full-body generation often fails to produce high-fidelity and diverse results due to the high complexity and diversity in the pose, body shape, and garment shape and texture. Recent geometrically controllable diffusion-based methods mainly rely on prompts to generate appearance and it is hard to balance the realism and the faithfulness of their results to the sketch when the input is coarse. This work presents Sketch2Human, the first system for controllable full-body human image generation guided by a semantic sketch (for geometry control) and a reference image (for appearance control). Our solution is based on the latent space of StyleGAN-Human with inverted geometry and appearance latent codes as input. Specifically, we present a sketch encoder trained with a large synthetic dataset sampled from StyleGAN-Human's latent space and directly supervised by sketches rather than real images. Considering the entangled information of partial geometry and texture in StyleGAN-Human and the absence of disentangled datasets, we design a novel training scheme that creates geometry-preserved and appearance-transferred training data to tune a generator to achieve disentangled geometry and appearance control. Although our method is trained with synthetic data, it can handle hand-drawn sketches as well. Qualitative and quantitative evaluations demonstrate the superior performance of our method to state-of-the-art methods.

Abstract (translated)

几何和外观控全身体像生成是一个有趣但具有挑战性的任务。现有的解决方案或依赖于粗略的条件(例如姿态,文本),从而缺乏对身体的几何和外观控制。绘画提供了这种编辑能力,并在各种基于草图的脸部生成和编辑解决方案中得到了应用。然而,直接将基于草图的脸部生成应用到全身生成通常由于姿态、身体形状和服装形状和纹理的高复杂性和多样性而无法产生高质量和多样性的结果。最近,基于几何可控制扩散的解决方案主要依赖于提示生成外观,而在输入粗糙时很难平衡现实感和结果的准确性。 这项工作提出了Sketch2Human,第一个基于语义草图的全身体人像生成系统(用于几何控制)和参考图像(用于外观控制)。我们的解决方案基于StyleGAN-Human的潜在空间,具有倒置的形状和外观潜在代码作为输入。具体来说,我们提出了一种用从StyleGAN-Human的潜在空间中采样的大规模合成数据训练的草图编码器,并直接从草图中监督生成图像。考虑到StyleGAN-Human中部分几何和纹理的纠缠信息以及缺乏解耦的数据集,我们设计了一种新的训练计划,以创建几何保留和外观传递的训练数据来调整生成器实现解耦几何和外观控制。尽管我们的方法是基于合成数据训练的,但它还可以处理手绘草图。定性和定量评估证明了我们的方法在现有方法中的卓越性能。

URL

https://arxiv.org/abs/2404.15889

PDF

https://arxiv.org/pdf/2404.15889.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot