Abstract
Virtual Try-on (VTON) involves generating images of a person wearing selected garments. Diffusion-based methods, in particular, can create high-quality images, but they struggle to maintain the identities of the input garments. We identified this problem stems from the specifics in the training formulation for diffusion. To address this, we propose a unique training scheme that limits the scope in which diffusion is trained. We use a control image that perfectly aligns with the target image during training. In turn, this accurately preserves garment details during inference. We demonstrate our method not only effectively conserves garment details but also allows for layering, styling, and shoe try-on. Our method runs multi-garment try-on in a single inference cycle and can support high-quality zoomed-in generations without training in higher resolutions. Finally, we show our method surpasses prior methods in accuracy and quality.
Abstract (translated)
虚拟试穿(VTON)涉及生成特定服装穿着的人物图像。特别是扩散方法可以创建高质量图像,但它们很难保留输入服装的个性。我们发现这个问题源于扩散训练 formulation 的具体细节。为解决这个问题,我们提出了一个独特的训练计划,限制了扩散训练的范围。我们使用一个控制图像,在训练过程中与目标图像完美对齐。进而,在推理过程中准确保留了服装细节。我们证明了我们的方法不仅有效地保留了服装细节,而且允许分层、造型和试穿。我们的方法在单个推理周期内运行多件试穿,并且可以支持高分辨率下的详细观察,而无需在训练中进行更高分辨率的开源。最后,我们证明了我们的方法在准确性和质量上超过了先前的算法。
URL
https://arxiv.org/abs/2403.13951