Abstract
Cross-modality data translation has attracted great interest in image computing. Deep generative models (\textit{e.g.}, GANs) show performance improvement in tackling those problems. Nevertheless, as a fundamental challenge in image translation, the problem of Zero-shot-Learning Cross-Modality Data Translation with fidelity remains unanswered. This paper proposes a new unsupervised zero-shot-learning method named Mutual Information guided Diffusion cross-modality data translation Model (MIDiffusion), which learns to translate the unseen source data to the target domain. The MIDiffusion leverages a score-matching-based generative model, which learns the prior knowledge in the target domain. We propose a differentiable local-wise-MI-Layer ($LMI$) for conditioning the iterative denoising sampling. The $LMI$ captures the identical cross-modality features in the statistical domain for the diffusion guidance; thus, our method does not require retraining when the source domain is changed, as it does not rely on any direct mapping between the source and target domains. This advantage is critical for applying cross-modality data translation methods in practice, as a reasonable amount of source domain dataset is not always available for supervised training. We empirically show the advanced performance of MIDiffusion in comparison with an influential group of generative models, including adversarial-based and other score-matching-based models.
Abstract (translated)
跨媒体数据翻译在图像计算中引起了巨大的兴趣。深度学习模型(例如GANs)在解决这些问题方面表现出性能改进。然而,作为图像翻译的一个基本挑战,无监督的零样本跨媒体数据翻译问题仍然存在。本文提出了一种名为“互信息引导扩散跨媒体数据翻译模型”的新无监督零样本学习方法,该方法将未访问的源数据翻译到目标域。该方法利用基于得分匹配的生成模型,学习目标域中的先验知识。我们提出了一种可分化局部互信息层(即“LMI”)来条件迭代去噪采样。该“LMI”在统计域中捕捉扩散指导中的相同跨媒体特征。因此,当源域发生变化时,不需要重新训练,因为不需要在源和目标域之间进行任何直接映射。这个优势对于在实践中应用跨媒体数据翻译方法至关重要,因为合理的目标域数据集往往不是可供监督训练的。我们经验证地展示了MIDiffusion相对于一个有影响力的生成模型群体(包括对抗性和其他得分匹配型模型)的 advanced 性能。
URL
https://arxiv.org/abs/2301.13743