Abstract
Recent advancements in the acquisition of various brain data sources have created new opportunities for integrating multimodal brain data to assist in early detection of complex brain disorders. However, current data integration approaches typically need a complete set of biomedical data modalities, which may not always be feasible, as some modalities are only available in large-scale research cohorts and are prohibitive to collect in routine clinical practice. Especially in studies of brain diseases, research cohorts may include both neuroimaging data and genetic data, but for practical clinical diagnosis, we often need to make disease predictions only based on neuroimages. As a result, it is desired to design machine learning models which can use all available data (different data could provide complementary information) during training but conduct inference using only the most common data modality. We propose a new incomplete multimodal data integration approach that employs transformers and generative adversarial networks to effectively exploit auxiliary modalities available during training in order to improve the performance of a unimodal model at inference. We apply our new method to predict cognitive degeneration and disease outcomes using the multimodal imaging genetic data from Alzheimer's Disease Neuroimaging Initiative (ADNI) cohort. Experimental results demonstrate that our approach outperforms the related machine learning and deep learning methods by a significant margin.
Abstract (translated)
近年来,获取各种脑数据源的进步,为整合多模态脑数据,帮助早期识别复杂的脑障碍创造了新的机会。然而,当前的数据整合方法通常需要完整的生物医学数据模态,这可能不一定可行,因为一些模态只有在大规模的研究群体才能提供,并且在常规临床实践中收集是禁止的。特别是对于脑疾病的研究,研究群体可能包括神经影像学数据和基因数据,但在实际临床诊断中,我们通常需要仅基于神经影像进行疾病预测。因此,我们希望设计一种机器学习模型,可以在训练期间使用所有可用的数据(不同的数据可以提供补充信息),但仅使用最常见的数据模态进行推理。我们提出了一种新的不完整的多模态数据整合方法,利用Transformer和生成对抗网络有效地利用训练期间提供的辅助模态,以提高单一模态模型的推理性能。我们应用我们的新方法来预测阿尔茨海默病神经影像学倡议(ADNI)研究群体中的多模态影像基因数据中的脑功能退化和疾病结果。实验结果显示,我们的方法相比相关的机器学习和深度学习方法表现出显著的优势。
URL
https://arxiv.org/abs/2305.16222