Abstract
Image-based virtual try-on is an increasingly important task for online shopping. It aims to synthesize images of a specific person wearing a specified garment. Diffusion model-based approaches have recently become popular, as they are excellent at image synthesis tasks. However, these approaches usually employ additional image encoders and rely on the cross-attention mechanism for texture transfer from the garment to the person image, which affects the try-on's efficiency and fidelity. To address these issues, we propose an Texture-Preserving Diffusion (TPD) model for virtual try-on, which enhances the fidelity of the results and introduces no additional image encoders. Accordingly, we make contributions from two aspects. First, we propose to concatenate the masked person and reference garment images along the spatial dimension and utilize the resulting image as the input for the diffusion model's denoising UNet. This enables the original self-attention layers contained in the diffusion model to achieve efficient and accurate texture transfer. Second, we propose a novel diffusion-based method that predicts a precise inpainting mask based on the person and reference garment images, further enhancing the reliability of the try-on results. In addition, we integrate mask prediction and image synthesis into a single compact model. The experimental results show that our approach can be applied to various try-on tasks, e.g., garment-to-person and person-to-person try-ons, and significantly outperforms state-of-the-art methods on popular VITON, VITON-HD databases.
Abstract (translated)
基于图像的虚拟试穿变得越来越重要,它旨在合成特定人物穿着指定服装的图像。近年来,基于扩散模型的方法越来越受欢迎,因为它们在图像合成任务上表现出色。然而,这些方法通常需要使用额外的图像编码器,并且依赖于从服装到人物图像的跨注意机制进行纹理传递,这会影响试穿的效率和准确性。为了解决这些问题,我们提出了一个纹理保留扩散(TPD)模型进行虚拟试穿,它增强了结果的准确性,同时没有增加额外的图像编码器。从两个方面做出贡献。首先,我们提出将遮罩人员和参考服装图像沿着空间维度连接并利用结果图像作为扩散模型的去噪UNet输入,这使得扩散模型中的原始自注意力层能够实现高效且准确的纹理转移。其次,我们提出了一种新的扩散基方法,根据人员和参考服装图像预测精确的修复掩码,进一步提高了试穿结果的可靠性。此外,我们将掩码预测和图像合成集成到一个紧凑的模型中。实验结果表明,我们的方法可以应用于各种试穿任务,例如服装到人员和人员到服装的试穿,而且在流行的大型VITON和VITON-HD数据库上显著超过了最先进的方法。
URL
https://arxiv.org/abs/2404.01089