Paper Reading AI Learner

Cross-modal Diffusion Modelling for Super-resolved Spatial Transcriptomics

2024-04-19 16:01:00
Xiaofei Wang, Xingxu Huang, Stephen J. Price, Chao Li

Abstract

The recent advancement of spatial transcriptomics (ST) allows to characterize spatial gene expression within tissue for discovery research. However, current ST platforms suffer from low resolution, hindering in-depth understanding of spatial gene expression. Super-resolution approaches promise to enhance ST maps by integrating histology images with gene expressions of profiled tissue spots. However, current super-resolution methods are limited by restoration uncertainty and mode collapse. Although diffusion models have shown promise in capturing complex interactions between multi-modal conditions, it remains a challenge to integrate histology images and gene expression for super-resolved ST maps. This paper proposes a cross-modal conditional diffusion model for super-resolving ST maps with the guidance of histology images. Specifically, we design a multi-modal disentangling network with cross-modal adaptive modulation to utilize complementary information from histology images and spatial gene expression. Moreover, we propose a dynamic cross-attention modelling strategy to extract hierarchical cell-to-tissue information from histology images. Lastly, we propose a co-expression-based gene-correlation graph network to model the co-expression relationship of multiple genes. Experiments show that our method outperforms other state-of-the-art methods in ST super-resolution on three public datasets.

Abstract (translated)

近年来,空间转录组学(ST)的进展允许在组织中对空间基因表达进行研究,从而推动发现研究。然而,当前的ST平台存在分辨率低和信噪比低的问题,阻碍了深入了解空间基因表达。超分辨率方法通过将组织图像与特征组织的基因表达集成来增强ST映射。然而,现有的超分辨率方法受到修复不确定性和模态崩塌的限制。尽管扩散模型在捕捉多模态条件下的复杂相互作用方面显示出前景,但将组织图像和基因表达集成到超分辨率ST映射仍然具有挑战性。本文提出了一种跨模态条件扩散模型,在历史图像的指导下解决ST映射的超分辨率问题。具体来说,我们设计了一个多模态去中心化网络,通过跨模态自适应调制利用组织图像的互补信息。此外,我们还提出了一种动态跨注意建模策略,从组织图像中提取层次结构细胞到组织信息。最后,我们提出了一种基于共表达的基因相关图网络来建模多个基因之间的共表达关系。实验结果表明,我们的方法在三个公开数据集上的ST超分辨率方面超过了最先进的水平。

URL

https://arxiv.org/abs/2404.12973

PDF

https://arxiv.org/pdf/2404.12973.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot