DM-VTON: Distilled Mobile Real-time Virtual Try-On

Abstract
Abstract (translated)
URL
PDF

Abstract

The fashion e-commerce industry has witnessed significant growth in recent years, prompting exploring image-based virtual try-on techniques to incorporate Augmented Reality (AR) experiences into online shopping platforms. However, existing research has primarily overlooked a crucial aspect - the runtime of the underlying machine-learning model. While existing methods prioritize enhancing output quality, they often disregard the execution time, which restricts their applications on a limited range of devices. To address this gap, we propose Distilled Mobile Real-time Virtual Try-On (DM-VTON), a novel virtual try-on framework designed to achieve simplicity and efficiency. Our approach is based on a knowledge distillation scheme that leverages a strong Teacher network as supervision to guide a Student network without relying on human parsing. Notably, we introduce an efficient Mobile Generative Module within the Student network, significantly reducing the runtime while ensuring high-quality output. Additionally, we propose Virtual Try-on-guided Pose for Data Synthesis to address the limited pose variation observed in training images. Experimental results show that the proposed method can achieve 40 frames per second on a single Nvidia Tesla T4 GPU and only take up 37 MB of memory while producing almost the same output quality as other state-of-the-art methods. DM-VTON stands poised to facilitate the advancement of real-time AR applications, in addition to the generation of lifelike attired human figures tailored for diverse specialized training tasks. this https URL

Abstract (translated)

过去几年,时尚电子商务行业经历了显著增长,这促使我们探索基于图像的虚拟试穿技术,将其引入在线购物平台。然而,现有研究主要忽略了一个关键方面——底层机器学习模型的运行时间。虽然现有方法主要关注提高输出质量,但它们常常忽视了执行时间,这限制了它们在有限设备范围内的应用。为了解决这一差距,我们提出了蒸馏移动实时虚拟试穿(DM-VTON),这是一种创新的虚拟试穿框架,旨在实现简单和高效。我们的方法是基于知识蒸馏计划,利用强大的教师网络作为监督,指导学生网络,而无需依赖人类解析。值得注意的是,我们引入了在学生网络内部的高效移动生成模块, significantly reduce the runtime while ensuring high-quality output。此外,我们提出了虚拟试穿指导姿态的数据合成方法,以解决训练图像中观察到的有限姿态变化。实验结果显示,该方法可以在单个NvidiaTesla T4GPU上实现每秒40帧,仅占用37 MB内存,同时与其他任何先进的方法输出质量几乎相同。DM-VTON已成为推动实时增强现实应用进步的障碍,此外,它还生成定制为各种专业训练任务的生命like服装人物。 this https URL 是 DM-VTON 的一个示例链接。

URL

https://arxiv.org/abs/2308.13798

PDF

https://arxiv.org/pdf/2308.13798.pdf

DM-VTON: Distilled Mobile Real-time Virtual Try-On

Abstract

Abstract (translated)

URL

PDF Copy

PDF