Paper Reading AI Learner

ACDG-VTON: Accurate and Contained Diffusion Generation for Virtual Try-On

2024-03-20 19:45:06
Jeffrey Zhang, Kedan Li, Shao-Yu Chang, David Forsyth

Abstract

Virtual Try-on (VTON) involves generating images of a person wearing selected garments. Diffusion-based methods, in particular, can create high-quality images, but they struggle to maintain the identities of the input garments. We identified this problem stems from the specifics in the training formulation for diffusion. To address this, we propose a unique training scheme that limits the scope in which diffusion is trained. We use a control image that perfectly aligns with the target image during training. In turn, this accurately preserves garment details during inference. We demonstrate our method not only effectively conserves garment details but also allows for layering, styling, and shoe try-on. Our method runs multi-garment try-on in a single inference cycle and can support high-quality zoomed-in generations without training in higher resolutions. Finally, we show our method surpasses prior methods in accuracy and quality.

Abstract (translated)

虚拟试穿(VTON)涉及生成特定服装穿着的人物图像。特别是扩散方法可以创建高质量图像,但它们很难保留输入服装的个性。我们发现这个问题源于扩散训练 formulation 的具体细节。为解决这个问题,我们提出了一个独特的训练计划,限制了扩散训练的范围。我们使用一个控制图像,在训练过程中与目标图像完美对齐。进而,在推理过程中准确保留了服装细节。我们证明了我们的方法不仅有效地保留了服装细节,而且允许分层、造型和试穿。我们的方法在单个推理周期内运行多件试穿,并且可以支持高分辨率下的详细观察,而无需在训练中进行更高分辨率的开源。最后,我们证明了我们的方法在准确性和质量上超过了先前的算法。

URL

https://arxiv.org/abs/2403.13951

PDF

https://arxiv.org/pdf/2403.13951.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot