Paper Reading AI Learner

Fill in Fabrics: Body-Aware Self-Supervised Inpainting for Image-Based Virtual Try-On

2022-10-03 13:25:31
H. Zunair, Y. Gobeil, S. Mercier, A. Ben Hamza

Abstract

Previous virtual try-on methods usually focus on aligning a clothing item with a person, limiting their ability to exploit the complex pose, shape and skin color of the person, as well as the overall structure of the clothing, which is vital to photo-realistic virtual try-on. To address this potential weakness, we propose a fill in fabrics (FIFA) model, a self-supervised conditional generative adversarial network based framework comprised of a Fabricator and a unified virtual try-on pipeline with a Segmenter, Warper and Fuser. The Fabricator aims to reconstruct the clothing image when provided with a masked clothing as input, and learns the overall structure of the clothing by filling in fabrics. A virtual try-on pipeline is then trained by transferring the learned representations from the Fabricator to Warper in an effort to warp and refine the target clothing. We also propose to use a multi-scale structural constraint to enforce global context at multiple scales while warping the target clothing to better fit the pose and shape of the person. Extensive experiments demonstrate that our FIFA model achieves state-of-the-art results on the standard VITON dataset for virtual try-on of clothing items, and is shown to be effective at handling complex poses and retaining the texture and embroidery of the clothing.

Abstract (translated)

URL

https://arxiv.org/abs/2210.00918

PDF

https://arxiv.org/pdf/2210.00918.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot