Paper Reading AI Learner

DM-VTON: Distilled Mobile Real-time Virtual Try-On

2023-08-26 07:46:27
Khoi-Nguyen Nguyen-Ngoc, Thanh-Tung Phan-Nguyen, Khanh-Duy Le, Tam V. Nguyen, Minh-Triet Tran, Trung-Nghia Le

Abstract

The fashion e-commerce industry has witnessed significant growth in recent years, prompting exploring image-based virtual try-on techniques to incorporate Augmented Reality (AR) experiences into online shopping platforms. However, existing research has primarily overlooked a crucial aspect - the runtime of the underlying machine-learning model. While existing methods prioritize enhancing output quality, they often disregard the execution time, which restricts their applications on a limited range of devices. To address this gap, we propose Distilled Mobile Real-time Virtual Try-On (DM-VTON), a novel virtual try-on framework designed to achieve simplicity and efficiency. Our approach is based on a knowledge distillation scheme that leverages a strong Teacher network as supervision to guide a Student network without relying on human parsing. Notably, we introduce an efficient Mobile Generative Module within the Student network, significantly reducing the runtime while ensuring high-quality output. Additionally, we propose Virtual Try-on-guided Pose for Data Synthesis to address the limited pose variation observed in training images. Experimental results show that the proposed method can achieve 40 frames per second on a single Nvidia Tesla T4 GPU and only take up 37 MB of memory while producing almost the same output quality as other state-of-the-art methods. DM-VTON stands poised to facilitate the advancement of real-time AR applications, in addition to the generation of lifelike attired human figures tailored for diverse specialized training tasks. this https URL

Abstract (translated)

过去几年,时尚电子商务行业经历了显著增长,这促使我们探索基于图像的虚拟试穿技术,将其引入在线购物平台。然而,现有研究主要忽略了一个关键方面——底层机器学习模型的运行时间。虽然现有方法主要关注提高输出质量,但它们常常忽视了执行时间,这限制了它们在有限设备范围内的应用。为了解决这一差距,我们提出了蒸馏移动实时虚拟试穿(DM-VTON),这是一种创新的虚拟试穿框架,旨在实现简单和高效。我们的方法是基于知识蒸馏计划,利用强大的教师网络作为监督,指导学生网络,而无需依赖人类解析。值得注意的是,我们引入了在学生网络内部的高效移动生成模块, significantly reduce the runtime while ensuring high-quality output。此外,我们提出了虚拟试穿指导姿态的数据合成方法,以解决训练图像中观察到的有限姿态变化。实验结果显示,该方法可以在单个NvidiaTesla T4GPU上实现每秒40帧,仅占用37 MB内存,同时与其他任何先进的方法输出质量几乎相同。DM-VTON已成为推动实时增强现实应用进步的障碍,此外,它还生成定制为各种专业训练任务的生命like服装人物。 this https URL 是 DM-VTON 的一个示例链接。

URL

https://arxiv.org/abs/2308.13798

PDF

https://arxiv.org/pdf/2308.13798.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot