Abstract
The electrocardiogram (ECG) is one of the most commonly used non-invasive, convenient medical monitoring tools that assist in the clinical diagnosis of heart diseases. Recently, deep learning (DL) techniques, particularly self-supervised learning (SSL), have demonstrated great potential in the classification of ECG. SSL pre-training has achieved competitive performance with only a small amount of annotated data after fine-tuning. However, current SSL methods rely on the availability of annotated data and are unable to predict labels not existing in fine-tuning datasets. To address this challenge, we propose Multimodal ECG-Text Self-supervised pre-training (METS), the first work to utilize the auto-generated clinical reports to guide ECG SSL pre-training. We use a trainable ECG encoder and a frozen language model to embed paired ECG and automatically machine-generated clinical reports separately. The SSL aims to maximize the similarity between paired ECG and auto-generated report while minimize the similarity between ECG and other reports. In downstream classification tasks, METS achieves around 10% improvement in performance without using any annotated data via zero-shot classification, compared to other supervised and SSL baselines that rely on annotated data. Furthermore, METS achieves the highest recall and F1 scores on the MIT-BIH dataset, despite MIT-BIH containing different classes of ECG compared to the pre-trained dataset. The extensive experiments have demonstrated the advantages of using ECG-Text multimodal self-supervised learning in terms of generalizability, effectiveness, and efficiency.
Abstract (translated)
心电图(ECG)是最常用的非侵入性、方便的医疗监测工具之一,协助心脏病的临床诊断。最近,深度学习(DL)技术,特别是自监督学习(SSL),在ECG分类方面表现出巨大的潜力。SSL pre-training通过微调后取得了与大量标记数据相关的 competitive 表现。然而,当前SSL方法依赖于标记数据的可用性,无法预测微调数据集上不存在的标签。为了解决这一挑战,我们提出了ECG文本modal自监督前训练(METS),是第一个利用自动生成的临床报告指导ECG SSL pre-training的工作。我们使用可训练ECG编码器和冻结语言模型,分别嵌入一对ECG和自动生成的临床报告。SSL的目标是最大化配对ECG和自动报告之间的相似性,同时最小化ECG和其他报告之间的相似性。在后续分类任务中,METS通过零样本分类取得了与依赖标记数据的其他自监督和SSL基准点相比约10%的性能改进,尽管MIT-BIH比训练数据集包含不同的ECG类别。此外,METS在MIT-BIH数据集上取得了最高的召回率和F1得分,尽管MIT-BIH相对于训练数据集包含不同的ECG类别。广泛的实验已经证明了使用ECG文本modal自监督学习在可移植性、效率和泛化性方面的优势。
URL
https://arxiv.org/abs/2303.12311