Paper Reading AI Learner

Zero-shot-Learning Cross-Modality Data Translation Through Mutual Information Guided Stochastic Diffusion

2023-01-31 16:24:34
Zihao Wang, Yingyu Yang, Maxime Sermesant, Hervé Delingette, Ona Wu

Abstract

Cross-modality data translation has attracted great interest in image computing. Deep generative models (\textit{e.g.}, GANs) show performance improvement in tackling those problems. Nevertheless, as a fundamental challenge in image translation, the problem of Zero-shot-Learning Cross-Modality Data Translation with fidelity remains unanswered. This paper proposes a new unsupervised zero-shot-learning method named Mutual Information guided Diffusion cross-modality data translation Model (MIDiffusion), which learns to translate the unseen source data to the target domain. The MIDiffusion leverages a score-matching-based generative model, which learns the prior knowledge in the target domain. We propose a differentiable local-wise-MI-Layer ($LMI$) for conditioning the iterative denoising sampling. The $LMI$ captures the identical cross-modality features in the statistical domain for the diffusion guidance; thus, our method does not require retraining when the source domain is changed, as it does not rely on any direct mapping between the source and target domains. This advantage is critical for applying cross-modality data translation methods in practice, as a reasonable amount of source domain dataset is not always available for supervised training. We empirically show the advanced performance of MIDiffusion in comparison with an influential group of generative models, including adversarial-based and other score-matching-based models.

Abstract (translated)

跨媒体数据翻译在图像计算中引起了巨大的兴趣。深度学习模型(例如GANs)在解决这些问题方面表现出性能改进。然而,作为图像翻译的一个基本挑战,无监督的零样本跨媒体数据翻译问题仍然存在。本文提出了一种名为“互信息引导扩散跨媒体数据翻译模型”的新无监督零样本学习方法,该方法将未访问的源数据翻译到目标域。该方法利用基于得分匹配的生成模型,学习目标域中的先验知识。我们提出了一种可分化局部互信息层(即“LMI”)来条件迭代去噪采样。该“LMI”在统计域中捕捉扩散指导中的相同跨媒体特征。因此,当源域发生变化时,不需要重新训练,因为不需要在源和目标域之间进行任何直接映射。这个优势对于在实践中应用跨媒体数据翻译方法至关重要,因为合理的目标域数据集往往不是可供监督训练的。我们经验证地展示了MIDiffusion相对于一个有影响力的生成模型群体(包括对抗性和其他得分匹配型模型)的 advanced 性能。

URL

https://arxiv.org/abs/2301.13743

PDF

https://arxiv.org/pdf/2301.13743.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot