Abstract
This study proposes a multi-modal fusion framework Multitrans based on the Transformer architecture and self-attention mechanism. This architecture combines the study of non-contrast computed tomography (NCCT) images and discharge diagnosis reports of patients undergoing stroke treatment, using a variety of methods based on Transformer architecture approach to predicting functional outcomes of stroke treatment. The results show that the performance of single-modal text classification is significantly better than single-modal image classification, but the effect of multi-modal combination is better than any single modality. Although the Transformer model only performs worse on imaging data, when combined with clinical meta-diagnostic information, both can learn better complementary information and make good contributions to accurately predicting stroke treatment effects..
Abstract (translated)
本研究提出了一种基于Transformer架构的多模态融合框架Multitrans,该框架基于自注意力机制。该架构将非对比计算断层扫描(NCCT)图像和接受中风治疗的患者出院诊断报告相结合,利用Transformer架构方法预测中风治疗的功能性结果。结果显示,单模态文本分类的性能明显优于单模态图像分类,但多模态组合的效果要优于任何单一模态。尽管Transformer模型在图像数据上的表现仅略逊于其他模型,但与临床元诊断信息相结合时,两者可以获得更好的互补信息,从而准确预测中风治疗效果。
URL
https://arxiv.org/abs/2404.12634