Time-series data in real-world medical settings typically exhibit long-range dependencies and are observed at non-uniform intervals. In such contexts, traditional sequence-based recurrent models struggle. To overcome this, researchers replace recurrent architectures with Neural ODE-based models to model irregularly sampled data and use Transformer-based architectures to account for long-range dependencies. Despite the success of these two approaches, both incur very high computational costs for input sequences of moderate lengths and greater. To mitigate this, we introduce the Rough Transformer, a variation of the Transformer model which operates on continuous-time representations of input sequences and incurs significantly reduced computational costs, critical for addressing long-range dependencies common in medical contexts. In particular, we propose multi-view signature attention, which uses path signatures to augment vanilla attention and to capture both local and global dependencies in input data, while remaining robust to changes in the sequence length and sampling frequency. We find that Rough Transformers consistently outperform their vanilla attention counterparts while obtaining the benefits of Neural ODE-based models using a fraction of the computational time and memory resources on synthetic and real-world time-series tasks.
真实世界医疗场景中的时间序列数据通常表现出长距离依赖关系,并且观察到的间隔是非均匀的。在这种情况下,传统基于序列的循环模型很难。为了克服这种,研究人员用基于神经网络的运动方程模型来建模非均匀采样数据,并使用基于Transformer的架构来处理长距离依赖关系。尽管这两种方法都取得了成功,但它们的输入序列中等长度和高维数据需要非常高的计算成本。为了减轻这种成本,我们引入了Rough Transformer,这是一种Transformer模型的变体,它在输入序列的连续时间表示上运行,并大大降低了计算成本,这对解决医疗场景中常见的长距离依赖关系非常重要。 特别是,我们提出了多视角签名注意,它使用路径签名来增强基本的注意力,并捕捉输入数据中的局部和全局依赖关系,同时保持对序列长度和采样周期的变化鲁棒。我们发现,Rough Transformers在 synthetic 和 real-world time-series 任务上的表现始终优于它们的普通注意力 counterparts,而使用的时间和内存资源却大大减少。
https://arxiv.org/abs/2403.10288
Deep learning (DL) models have been advancing automatic medical image analysis on various modalities, including echocardiography, by offering a comprehensive end-to-end training pipeline. This approach enables DL models to regress ejection fraction (EF) directly from 2D+time echocardiograms, resulting in superior performance. However, the end-to-end training pipeline makes the learned representations less explainable. The representations may also fail to capture the continuous relation among echocardiogram clips, indicating the existence of spurious correlations, which can negatively affect the generalization. To mitigate this issue, we propose CoReEcho, a novel training framework emphasizing continuous representations tailored for direct EF regression. Our extensive experiments demonstrate that CoReEcho: 1) outperforms the current state-of-the-art (SOTA) on the largest echocardiography dataset (EchoNet-Dynamic) with MAE of 3.90 & R2 of 82.44, and 2) provides robust and generalizable features that transfer more effectively in related downstream tasks. The code is publicly available at this https URL.
深度学习(DL)模型已经在各种模式上 advances automatic medical image analysis,包括超声心动图,通过提供端到端的训练管道。这种方法使得DL模型可以从2D+时间超声心动图中直接回归射血分数(EF),从而实现卓越的性能。然而,端到端的训练管道使学习到的表示变得难以解释。表示也可能无法捕捉超声心动图片段之间的连续关系,表明存在伪相关,这可能对泛化产生负面影响。为了减轻这个问题,我们提出了CoReEcho,一种关注于直接EF回归的连续表示的训练框架。我们的大量实验证明,CoReEcho:1) 在echoNet-Dynamic超声心动图数据集(EchoNet-Dynamic)上超越了当前最先进的水平(SOTA),其均方误差(MAE)为3.90,相关方差(R2)为82.44;2) 提供了稳健且具有更好通用性的特征,在相关下游任务上传递更有效地。代码公开在這個 https URL 上。
https://arxiv.org/abs/2403.10164
We introduce eCLIP, an enhanced version of the CLIP model that integrates expert annotations in the form of radiologist eye-gaze heatmaps. It tackles key challenges in contrastive multi-modal medical imaging analysis, notably data scarcity and the "modality gap" -- a significant disparity between image and text embeddings that diminishes the quality of representations and hampers cross-modal interoperability. eCLIP integrates a heatmap processor and leverages mixup augmentation to efficiently utilize the scarce expert annotations, thus boosting the model's learning effectiveness. eCLIP is designed to be generally applicable to any variant of CLIP without requiring any modifications of the core architecture. Through detailed evaluations across several tasks, including zero-shot inference, linear probing, cross-modal retrieval, and Retrieval Augmented Generation (RAG) of radiology reports using a frozen Large Language Model, eCLIP showcases consistent improvements in embedding quality. The outcomes reveal enhanced alignment and uniformity, affirming eCLIP's capability to harness high-quality annotations for enriched multi-modal analysis in the medical imaging domain.
我们介绍了一个名为eCLIP的增强版CLIP模型,它将专家注释的形式体现在放射科医生眼部热图上。它解决了对比 multi-modal 医学影像分析中的关键挑战,尤其是数据稀缺性和“模态差距”——这是图像和文本嵌入之间显著的差异,使得表示的质量下降,且阻碍跨模态互操作性。eCLIP 整合了一个热图处理器并利用混合增强来有效地利用稀缺的专家注释,从而提高模型的学习效果。eCLIP 旨在适用于任何 CLIP 的变体,而无需对核心架构进行任何修改。通过在多个任务上的详细评估,包括零散推理、线性探测、跨模态检索和 Retrieval Augmented Generation(RAG)用于放射学报告,eCLIP 展示了在嵌入质量方面的持续改进。结果表明,其加强了对比度和一致性,证实了 eCLIP 在医学影像领域中利用高质量注释进行丰富多模态分析的能力。
https://arxiv.org/abs/2403.10153
Early diagnosis of Alzheimer's Disease (AD) is very important for following medical treatments, and eye movements under special visual stimuli may serve as a potential non-invasive biomarker for detecting cognitive abnormalities of AD patients. In this paper, we propose an Depth-induced saliency comparison network (DISCN) for eye movement analysis, which may be used for diagnosis the Alzheimers disease. In DISCN, a salient attention module fuses normal eye movements with RGB and depth maps of visual stimuli using hierarchical salient attention (SAA) to evaluate comprehensive saliency maps, which contain information from both visual stimuli and normal eye movement behaviors. In addition, we introduce serial attention module (SEA) to emphasis the most abnormal eye movement behaviors to reduce personal bias for a more robust result. According to our experiments, the DISCN achieves consistent validity in classifying the eye movements between the AD patients and normal controls.
早诊断阿尔茨海默病(AD)对后续医疗治疗非常重要,而特殊视觉刺激下的眼动可能成为非侵入性检测AD患者认知异常的潜在指标。在本文中,我们提出了一个深度引导的显著性比较网络(DISCN)用于眼动分析,该网络可用于诊断AD。在DISCN中,显著性关注模块将正常眼动与视觉刺激的RGB和深度地图融合在一起,使用分层显著性注意(SAA)评估全面显著性图,包含来自视觉刺激和正常眼动行为的更多信息。此外,我们还引入了序列注意模块(SEA)来强调AD患者中最异常的眼动行为,以减少个人偏见,获得更稳健的结果。根据我们的实验结果,DISCN在将AD患者与正常控制者的眼动之间进行分类时具有一致的准确性。
https://arxiv.org/abs/2403.10124
In the wake of the global spread of monkeypox, accurate disease recognition has become crucial. This study introduces an improved SE-InceptionV3 model, embedding the SENet module and incorporating L2 regularization into the InceptionV3 framework to enhance monkeypox disease detection. Utilizing the Kaggle monkeypox dataset, which includes images of monkeypox and similar skin conditions, our model demonstrates a noteworthy accuracy of 96.71% on the test set, outperforming conventional methods and deep learning models. The SENet modules channel attention mechanism significantly elevates feature representation, while L2 regularization ensures robust generalization. Extensive experiments validate the models superiority in precision, recall, and F1 score, highlighting its effectiveness in differentiating monkeypox lesions in diverse and complex cases. The study not only provides insights into the application of advanced CNN architectures in medical diagnostics but also opens avenues for further research in model optimization and hyperparameter tuning for enhanced disease recognition. this https URL
在全球猴痘传播的背景下,准确的疾病识别变得至关重要。这项研究引入了一个改进的SE-InceptionV3模型,包括嵌入SENet模块和将L2正则化集成到InceptionV3框架中,以提高猴痘疾病检测的精度。利用Kaggle猴痘数据集,该数据集包括猴痘和其他皮肤病的图像,我们的模型在测试集上的准确率为96.71%,超越了传统方法和深度学习模型。SENet模块的通道注意力机制显著提高了特征表示,而L2正则化确保了鲁棒的泛化能力。大量的实验证实了该模型的精确度、召回率和F1分数的优越性,突出了其在不同复杂情况下区分猴痘病变的有效性。这项研究不仅为医疗诊断中高级CNN架构的应用提供了见解,而且为进一步研究模型优化和超参数调整以提高疾病识别打开了道路。您可以通过以下链接查看该研究:https://www.kaggle.com/intel-health/monkeypox-detection
https://arxiv.org/abs/2403.10087
Electronic health records include information on patients' status and medical history, which could cover the history of diseases and disorders that could be hereditary. One important use of family history information is in precision health, where the goal is to keep the population healthy with preventative measures. Natural Language Processing (NLP) and machine learning techniques can assist with identifying information that could assist health professionals in identifying health risks before a condition is developed in their later years, saving lives and reducing healthcare costs. We survey the literature on the techniques from the NLP field that have been developed to utilise digital health records to identify risks of familial diseases. We highlight that rule-based methods are heavily investigated and are still actively used for family history extraction. Still, more recent efforts have been put into building neural models based on large-scale pre-trained language models. In addition to the areas where NLP has successfully been utilised, we also identify the areas where more research is needed to unlock the value of patients' records regarding data collection, task formulation and downstream applications.
电子病历包括患者的状况和医疗历史信息,这些信息可能涵盖可以遗传的疾病的病史。家族史信息的另一个重要应用是精确医疗,其目标是保持人口健康通过预防措施。自然语言处理(NLP)和机器学习技术可以帮助医生在疾病出现前识别健康风险,节省生命并降低医疗费用。我们调查了NLP领域中用于利用电子病历识别家族病风险的技术。我们强调,基于规则的方法仍在积极研究,并仍然是家族史提取的主要方法。尽管如此,近年来已经投入了大量精力基于大规模预训练语言模型构建神经模型。除了已经成功利用NLP技术的领域外,我们还发现了需要更多研究以释放患者记录关于数据收集、任务设计和 downstream应用的价值的领域。
https://arxiv.org/abs/2403.09997
The integration of artificial intelligence (AI) with radiology has marked a transformative era in medical diagnostics. Vision foundation models have been adopted to enhance radiologic imaging analysis. However, the distinct complexities of radiological imaging, including the interpretation of 2D and 3D radiological data, pose unique challenges that existing models, trained on general non-medical images, fail to address adequately. To bridge this gap and capitalize on the diagnostic precision required in medical imaging, we introduce RadCLIP: a pioneering cross-modal foundational model that harnesses Contrastive Language-Image Pre-training (CLIP) to refine radiologic image analysis. RadCLIP incorporates a novel 3D slice pooling mechanism tailored for volumetric image analysis and is trained using a comprehensive and diverse dataset of radiologic image-text pairs. Our evaluations demonstrate that RadCLIP effectively aligns radiological images with their corresponding textual annotations, and in the meantime, offers a robust vision backbone for radiologic imagery with significant promise.
人工智能(AI)与影像学的结合标志着医疗诊断分析的新时代。采用了视觉基础模型来增强放射性影像分析。然而,放射性影像学的独特复杂性,包括 2D 和 3D 放射性数据的理解,提出了现有模型,通过训练在非医学图像上的模型,无法充分解决的问题。为了弥合这一空白并利用医疗影像诊断所需的精确性,我们引入了 RadCLIP:一款开创性的跨模态基础模型,利用对比性语言图像预训练(CLIP)来优化放射性影像分析。RadCLIP 采用了一种专为体积图像分析而设计的全新 3D 分割池化机制,并通过使用一系列丰富的放射性图像-文本对数据集进行训练。我们的评估结果表明,RadCLIP 有效地将放射性图像与相应的文本注释对齐,与此同时,为放射性影像带来相当大的潜力。
https://arxiv.org/abs/2403.09948
Conventional imaging diagnostics frequently encounter bottlenecks due to manual inspection, which can lead to delays and inconsistencies. Although deep learning offers a pathway to automation and enhanced accuracy, foundational models in computer vision often emphasize global context at the expense of local details, which are vital for medical imaging diagnostics. To address this, we harness the Swin Transformer's capacity to discern extended spatial dependencies within images through the hierarchical framework. Our novel contribution lies in refining local feature representations, orienting them specifically toward the final distribution of the classifier. This method ensures that local features are not only preserved but are also enriched with task-specific information, enhancing their relevance and detail at every hierarchical level. By implementing this strategy, our model demonstrates significant robustness and precision, as evidenced by extensive validation of two established benchmarks for Knee OsteoArthritis (KOA) grade classification. These results highlight our approach's effectiveness and its promising implications for the future of medical imaging diagnostics. Our implementation is available on this https URL
常规的影像诊断检查常常因为手动检查而遇到瓶颈,这可能导致延迟和不一致性。尽管深度学习提供了一种自动化和提高准确性的途径,但计算机视觉的基础模型通常强调全局上下文,而忽视了局部细节,这些细节对医疗影像诊断至关重要。为了解决这个问题,我们利用Swin Transformer在图像中识别扩展空间依赖性的能力,通过分层框架。我们的新贡献在于改进局部特征表示,将它们专门指向分类器的最终分布。这种方法确保了局部特征不仅得以保留,而且还得到了与任务相关的信息的丰富,从而提高了它们在每一层的重要性。通过实现这一策略,我们的模型在充分验证两个成熟基准测试(Knee OsteoArthritis,KOA)分类器方面表现出了显著的稳健性和精度。这些结果突出了我们方法的有效性和其对医疗影像诊断未来发展的潜在影响。我们的实现可通过此链接访问:
https://arxiv.org/abs/2403.09947
Background and aims Generalizability of AI colonoscopy algorithms is important for wider adoption in clinical practice. However, current techniques for evaluating performance on unseen data require expensive and time-intensive labels. Methods We use a "Masked Siamese Network" (MSN) to identify novel phenomena in unseen data and predict polyp detector performance. MSN is trained to predict masked out regions of polyp images, without any labels. We test MSN's ability to be trained on data only from Israel and detect unseen techniques, narrow-band imaging (NBI) and chromendoscoy (CE), on colonoscopes from Japan (354 videos, 128 hours). We also test MSN's ability to predict performance of Computer Aided Detection (CADe) of polyps on colonoscopies from both countries, even though MSN is not trained on data from Japan. Results MSN correctly identifies NBI and CE as less similar to Israel whitelight than Japan whitelight (bootstrapped z-test, |z| > 496, p < 10-8 for both) using the label-free Frechet distance. MSN detects NBI with 99% accuracy, predicts CE better than our heuristic (90% vs 79% accuracy) despite being trained only on whitelight, and is the only method that is robust to noisy labels. MSN predicts CADe polyp detector performance on in-domain Israel and out-of-domain Japan colonoscopies (r=0.79, 0.37 respectively). With few examples of Japan detector performance to train on, MSN prediction of Japan performance improves (r=0.56). Conclusion Our technique can identify distribution shifts in clinical data and can predict CADe detector performance on unseen data, without labels. Our self-supervised approach can aid in detecting when data in practice is different from training, such as between hospitals or data has meaningfully shifted from training. MSN has potential for application to medical image domains beyond colonoscopy.
背景和目标 AI结肠癌筛查算法的泛化在临床实践中具有重要的意义。然而,目前用于评估未见数据上表现的现有技术需要昂贵且耗时费力的标签。方法 我们使用“遮罩 siamese 网络”(MSN)来识别未见数据中的新现象并预测结肠癌检测器的性能。MSN被训练预测遮罩图像中的未见区域,无需任何标签。我们测试MSN在仅从以色列的数据上进行训练的能力,并检测未见技术和窄带成像(NBI)和色域成像(CE)在来自日本的结肠癌镜检(354个视频,128小时)上的效果。我们还测试MSN预测来自日本和以色列的计算机辅助检测(CADe)结肠癌镜检中结肠癌的性能,即使MSN没有在日本的数据上进行训练。结果 MSN正确地使用无标签 Frechet 距离正确识别NBI和CE为与以色列白光相比更不相似(通过自由度为|z| > 496,p < 10-8)与日本白光相比(自由度为|z| > 496,p < 10-8)。MSN的结肠癌检测准确率可以达到99%,预测CE的准确率比我们的经验更好(90% vs 79%准确率),尽管它仅在白光上进行训练。MSN是唯一一个对噪音标签具有鲁棒性的方法。MSN预测境内以色列和境外日本的结肠癌镜检中的计算机辅助检测(CADe)结肠癌检测器性能(r = 0.79,0.37)。由于日本检测器的性能数据极少,MSN预测的日本性能改善(r = 0.56)。结论我们的技术可以识别临床数据的分布变化并预测无标签的结肠癌检测器性能。我们的自监督方法可以帮助检测实际数据与训练数据之间的差异,例如医院之间或数据从训练到应用之间。MSN在医学图像领域有潜在的应用价值。
https://arxiv.org/abs/2403.09920
Data augmentation is one of the most effective techniques to improve the generalization performance of deep neural networks. Yet, despite often facing limited data availability in medical image analysis, it is frequently underutilized. This appears to be due to a gap in our collective understanding of the efficacy of different augmentation techniques across medical imaging tasks and modalities. One domain where this is especially true is breast ultrasound images. This work addresses this issue by analyzing the effectiveness of different augmentation techniques for the classification of breast lesions in ultrasound images. We assess the generalizability of our findings across several datasets, demonstrate that certain augmentations are far more effective than others, and show that their usage leads to significant performance gains.
数据增强是提高深度神经网络泛化性能的最有效的技术之一。然而,尽管在医学图像分析中经常面临数据不足的问题,但这种情况经常被忽视。这似乎是因为我们共同理解不同增强技术在医学影像任务和模式上的有效性存在一定的差距。在乳腺超声图像中,这个问题尤为突出。本文旨在分析不同增强技术在乳腺超声图像分类中的有效性。我们评估了我们的发现在不同数据集上的可重复性,证明了某些增强技术比其他更有效,并表明它们的运用带来了显著的性能提升。
https://arxiv.org/abs/2403.09828
Segment anything models (SAMs) are gaining attention for their zero-shot generalization capability in segmenting objects of unseen classes and in unseen domains when properly prompted. Interactivity is a key strength of SAMs, allowing users to iteratively provide prompts that specify objects of interest to refine outputs. However, to realize the interactive use of SAMs for 3D medical imaging tasks, rapid inference times are necessary. High memory requirements and long processing delays remain constraints that hinder the adoption of SAMs for this purpose. Specifically, while 2D SAMs applied to 3D volumes contend with repetitive computation to process all slices independently, 3D SAMs suffer from an exponential increase in model parameters and FLOPS. To address these challenges, we present FastSAM3D which accelerates SAM inference to 8 milliseconds per 128*128*128 3D volumetric image on an NVIDIA A100 GPU. This speedup is accomplished through 1) a novel layer-wise progressive distillation scheme that enables knowledge transfer from a complex 12-layer ViT-B to a lightweight 6-layer ViT-Tiny variant encoder without training from scratch; and 2) a novel 3D sparse flash attention to replace vanilla attention operators, substantially reducing memory needs and improving parallelization. Experiments on three diverse datasets reveal that FastSAM3D achieves a remarkable speedup of 527.38x compared to 2D SAMs and 8.75x compared to 3D SAMs on the same volumes without significant performance decline. Thus, FastSAM3D opens the door for low-cost truly interactive SAM-based 3D medical imaging segmentation with commonly used GPU hardware. Code is available at this https URL.
segment anything models(SAMs)因在正确提示下进行零 shot类别的物体分割和未见过的领域中的物体分割而受到关注。交互性是SAM的一个关键优势,使用户能够逐步提供感兴趣的对象,以优化输出。然而,为了实现SAM在3D医学成像任务中的交互使用,需要快速推理时间。高内存需求和长处理延迟仍然是阻碍SAM采用的因素。具体来说,尽管2D SAM应用于3D卷积,但它们在处理所有切片时仍然进行重复计算,而3D SAM在模型参数和FLOPs上经历指数增长。为解决这些挑战,我们提出了FastSAM3D,它通过在NVIDIA A100 GPU上加速SAM推理8毫秒每128x128x128 3D卷积图像来实现。这个速度提升是通过1)一种新颖的逐层 progressive distillation 方案实现的,该方案可以将复杂的高级 ViT-B 层知识传递给轻量级的 6-层 ViT-Tiny 层编码器,而无需从头开始训练;以及2)一种新颖的 3D sparse flash attention 来代替传统的注意力操作,从而大大降低内存需求并提高并行度。在三个不同的数据集上的实验表明,FastSAM3D 与 2D SAM 和与同一卷积图像相比,3D SAM 的速度提高了527.38倍,而性能没有下降。因此,FastSAM3D 为使用常见GPU硬件实现低成本的真正交互式SAM 3D医学成像分割打开了大门。代码可以从这个链接获取。
https://arxiv.org/abs/2403.09827
This paper investigates the development and optimization of control algorithms for mobile robotics, with a keen focus on their implementation in Field-Programmable Gate Arrays (FPGAs). It delves into both classical control approaches such as PID and modern techniques including deep learning, addressing their application in sectors ranging from industrial automation to medical care. The study highlights the practical challenges and advancements in embedding these algorithms into FPGAs, which offer significant benefits for mobile robotics due to their high-speed processing and parallel computation capabilities. Through an analysis of various control strategies, the paper showcases the improvements in robot performance, particularly in navigation and obstacle avoidance. It emphasizes the critical role of FPGAs in enhancing the efficiency and adaptability of control algorithms in dynamic environments. Additionally, the research discusses the difficulties in benchmarking and evaluating the performance of these algorithms in real-world applications, suggesting a need for standardized evaluation criteria. The contribution of this work lies in its comprehensive examination of control algorithms' potential in FPGA-based mobile robotics, offering insights into future research directions for improving robotic autonomy and operational efficiency.
本文研究了移动机器人控制算法的开发和优化,重点关注它们在场可编程门阵列(FPGAs)上的实现。它深入研究了经典控制方法(如PID)以及包括深度学习在内的现代技术,探讨了它们在工业自动化和医疗保健等领域的应用。研究突出了将这些算法嵌入FPGAs中所带来的实际挑战和进步,这些FPGAs具有高速处理和并行计算能力,为移动机器人带来了显著的优越性能。通过分析各种控制策略,本文展示了机器人性能的提高,特别是在导航和避障方面的表现。它强调了FPGAs在提高动态环境中控制算法的效率和适应性方面发挥的关键作用。此外,研究讨论了在实际应用中基准测试和评估这些算法的困难,建议需要制定标准化的评估标准。本工作的贡献在于对基于FPGAs的移动机器人控制算法潜力进行全面评估,为提高机器人的自主性和操作效率的未来研究方向提供了见解。
https://arxiv.org/abs/2403.09459
Causal generative modelling is gaining interest in medical imaging due to its ability to answer interventional and counterfactual queries. Most work focuses on generating counterfactual images that look plausible, using auxiliary classifiers to enforce effectiveness of simulated interventions. We investigate pitfalls in this approach, discovering the issue of attribute amplification, where unrelated attributes are spuriously affected during interventions, leading to biases across protected characteristics and disease status. We show that attribute amplification is caused by the use of hard labels in the counterfactual training process and propose soft counterfactual fine-tuning to mitigate this issue. Our method substantially reduces the amplification effect while maintaining effectiveness of generated images, demonstrated on a large chest X-ray dataset. Our work makes an important advancement towards more faithful and unbiased causal modelling in medical imaging.
因果生成建模在医学影像领域受到关注,因为它能够回答干预和反事实查询。大部分工作都集中于使用辅助分类器生成看起来合理的反事实图像,以强制模拟干预的有效性。我们研究了这种方法中的一个缺陷,即属性 amplification 问题,其中无关属性在干预过程中被无根据地放大,导致在保护特征和疾病状态之间存在偏见。我们证明了属性 amplification 是由反事实训练过程中使用硬标签引起的,并提出了软反事实微调来减轻这个问题。我们的方法在保持生成图像的有效性的同时显著减少了放大效应,并在大型胸部X光片数据集上得到了验证。我们的工作在医学影像领域朝着更准确和无偏见的原因建模方向迈出了重要的一步。
https://arxiv.org/abs/2403.09422
Utilizing potent representations of the large vision-language models (VLMs) to accomplish various downstream tasks has attracted increasing attention. Within this research field, soft prompt learning has become a representative approach for efficiently adapting VLMs such as CLIP, to tasks like image classification. However, most existing prompt learning methods learn text tokens that are unexplainable, which cannot satisfy the stringent interpretability requirements of Explainable Artificial Intelligence (XAI) in high-stakes scenarios like healthcare. To address this issue, we propose a novel explainable prompt learning framework that leverages medical knowledge by aligning the semantics of images, learnable prompts, and clinical concept-driven prompts at multiple granularities. Moreover, our framework addresses the lack of valuable concept annotations by eliciting knowledge from large language models and offers both visual and textual explanations for the prompts. Extensive experiments and explainability analyses conducted on various datasets, with and without concept labels, demonstrate that our method simultaneously achieves superior diagnostic performance, flexibility, and interpretability, shedding light on the effectiveness of foundation models in facilitating XAI. The code will be made publically available.
利用大型视觉语言模型(VLMs)完成各种下游任务的强大表示已经引起了越来越多的关注。在这个研究领域,软提示学习已成为将诸如CLIP这样的VLM适应图像分类等任务的效率代表方法。然而,大多数现有的提示学习方法学习无法解释的文本标记,这无法满足在如医疗保健等高风险场景中实现可解释人工智能(XAI)的严格可解释性要求。为解决这个问题,我们提出了一个新颖的可解释提示学习框架,它利用医疗知识将图像、可学习提示和临床概念驱动的提示的语义对齐多个粒度。此外,我们的框架通过从大型语言模型中激发知识解决了概念注释不足的问题,并为提示提供视觉和文本解释。在各种数据集上进行的大量实验和可解释性分析表明,我们的方法同时实现了卓越的诊断性能、灵活性和可解释性,为基线模型在促进XAI中的有效性提供了曙光。代码将公开发布。
https://arxiv.org/abs/2403.09410
Medical data often exhibits distribution shifts, which cause test-time performance degradation for deep learning models trained using standard supervised learning pipelines. This challenge is addressed in the field of Domain Generalization (DG) with the sub-field of Single Domain Generalization (SDG) being specifically interesting due to the privacy- or logistics-related issues often associated with medical data. Existing disentanglement-based SDG methods heavily rely on structural information embedded in segmentation masks, however classification labels do not provide such dense information. This work introduces a novel SDG method aimed at medical image classification that leverages channel-wise contrastive disentanglement. It is further enhanced with reconstruction-based style regularization to ensure extraction of distinct style and structure feature representations. We evaluate our method on the complex task of multicenter histopathology image classification, comparing it against state-of-the-art (SOTA) SDG baselines. Results demonstrate that our method surpasses the SOTA by a margin of 1% in average accuracy while also showing more stable performance. This study highlights the importance and challenges of exploring SDG frameworks in the context of the classification task. The code is publicly available at this https URL
医疗数据通常表现出分布变化,这导致使用标准监督学习途径训练的深度学习模型在测试时间内性能下降。这个问题在领域通用(DG)领域通过域泛化(SDG)子领域以及由于与医疗数据相关的隐私或物流问题而备受关注的单域泛化(SDG)得到了解决。现有的基于分离的SDG方法很大程度上依赖于分割掩码中嵌入的结构性信息,然而分类标签并不提供这样的丰富信息。本文介绍了一种新的SDG方法,旨在解决医疗图像分类问题,该方法利用通道级的对比性分离。它通过基于重构的样式正则化进一步增强了提取独特样式和结构特征表示。我们在多中心组织病理学图像分类的复杂任务上评估我们的方法,并将其与最先进的SDG基线进行比较。结果表明,与最先进的SDG基线相比,我们的方法在平均准确度上提高了1%的领先优势,同时表现出更稳定的性能。本研究突出了在分类任务背景下探索SDG框架的重要性以及所面临的挑战。代码公开在https:// this URL
https://arxiv.org/abs/2403.09400
The Internet of Medical Things (IoMT) transcends traditional medical boundaries, enabling a transition from reactive treatment to proactive prevention. This innovative method revolutionizes healthcare by facilitating early disease detection and tailored care, particularly in chronic disease management, where IoMT automates treatments based on real-time health data collection. Nonetheless, its benefits are countered by significant security challenges that endanger the lives of its users due to the sensitivity and value of the processed data, thereby attracting malicious interests. Moreover, the utilization of wireless communication for data transmission exposes medical data to interception and tampering by cybercriminals. Additionally, anomalies may arise due to human errors, network interference, or hardware malfunctions. In this context, anomaly detection based on Machine Learning (ML) is an interesting solution, but it comes up against obstacles in terms of explicability and protection of privacy. To address these challenges, a new framework for Intrusion Detection Systems (IDS) is introduced, leveraging Artificial Neural Networks (ANN) for intrusion detection while utilizing Federated Learning (FL) for privacy preservation. Additionally, eXplainable Artificial Intelligence (XAI) methods are incorporated to enhance model explanation and interpretation. The efficacy of the proposed framework is evaluated and compared with centralized approaches using multiple datasets containing network and medical data, simulating various attack types impacting the confidentiality, integrity, and availability of medical and physiological data. The results obtained offer compelling evidence that the FL method performs comparably to the centralized method, demonstrating high performance. Additionally, it affords the dual advantage of safeguarding privacy and providing model explanation.
互联网医疗事物(IoMT)超越了传统医疗界限,使从反应性治疗到主动预防的转变成为可能。这种创新方法通过促进早期疾病检测和个性化的护理,彻底颠覆了医疗行业,特别是在慢性疾病管理方面,IoMT根据实时健康数据收集自动化治疗。然而,其益处被安全挑战所抵消,这些挑战导致用户生命的威胁,因为处理数据的敏感性和价值,从而吸引了恶意兴趣。此外,通过无线通信数据传输,医疗数据可能被网络犯罪分子拦截和篡改。此外,由于人类错误、网络干扰或硬件故障,异常情况可能出现。在这种情况下,基于机器学习(ML)的异常检测是一个有趣的解决方案,但它面临可解释性和隐私保护方面的挑战。为应对这些挑战,我们引入了一个新的入侵检测系统(IDS)框架,利用人工神经网络(ANN)进行入侵检测,同时利用联邦学习(FL)进行隐私保护。此外,引入了可解释性人工智能(XAI)方法,以增强模型解释和解释。对所提出的框架的有效性进行评估,并将其与包含网络和医疗数据的多个数据集的集中方法进行比较,模拟各种攻击类型对医疗和生理数据机密性、完整性和可用性影响的攻击。获得的结果表明,FL方法与集中方法相当,证明了其高性能。此外,它为保护隐私和提供模型解释提供了双重优势。
https://arxiv.org/abs/2403.09752
Medical anomaly detection aims to identify abnormal findings using only normal training data, playing a crucial role in health screening and recognizing rare diseases. Reconstruction-based methods, particularly those utilizing autoencoders (AEs), are dominant in this field. They work under the assumption that AEs trained on only normal data cannot reconstruct unseen abnormal regions well, thereby enabling the anomaly detection based on reconstruction errors. However, this assumption does not always hold due to the mismatch between the reconstruction training objective and the anomaly detection task objective, rendering these methods theoretically unsound. This study focuses on providing a theoretical foundation for AE-based reconstruction methods in anomaly detection. By leveraging information theory, we elucidate the principles of these methods and reveal that the key to improving AE in anomaly detection lies in minimizing the information entropy of latent vectors. Experiments on four datasets with two image modalities validate the effectiveness of our theory. To the best of our knowledge, this is the first effort to theoretically clarify the principles and design philosophy of AE for anomaly detection. Code will be available upon acceptance.
医学异常检测的目标是仅使用正常训练数据来识别异常结果,在健康筛查和识别罕见疾病中发挥着关键作用。基于重构的方法,特别是使用自动编码器(AEs)的方法,在這個領域占据主導地位。它們假設,仅使用正常數據訓練的AE無法很好地重构未見的异常區域,從而基於重构誤差進行异常檢測。然而,這個假設並不總是成立,因為重构訓練目標和 anomaly detection 任務目標之間存在差距,這使得這些方法在理論上是不穩定的。 本研究旨在為基於AE的 anomaly detection 提供理論基礎。通過利用信息理論,我們阐明了這些方法的原理,並揭示了在 anomaly detection 中提高AE的關鍵是縮小隱含向量的信息熵。在四個數據集上進行的實驗證明了我們理論的有效性。據我們所知,這是我們第一次從理論上明確阐明AE在 anomaly detection 中的原則和設計理念。 code将在接受論文中可用。
https://arxiv.org/abs/2403.09303
Learning medical visual representations through vision-language pre-training has reached remarkable progress. Despite the promising performance, it still faces challenges, i.e., local alignment lacks interpretability and clinical relevance, and the insufficient internal and external representation learning of image-report pairs. To address these issues, we propose an Anatomical Structure-Guided (ASG) framework. Specifically, we parse raw reports into triplets <anatomical region, finding, existence>, and fully utilize each element as supervision to enhance representation learning. For anatomical region, we design an automatic anatomical region-sentence alignment paradigm in collaboration with radiologists, considering them as the minimum semantic units to explore fine-grained local alignment. For finding and existence, we regard them as image tags, applying an image-tag recognition decoder to associate image features with their respective tags within each sample and constructing soft labels for contrastive learning to improve the semantic association of different image-report pairs. We evaluate the proposed ASG framework on two downstream tasks, including five public benchmarks. Experimental results demonstrate that our method outperforms the state-of-the-art methods.
通过视觉语言预训练学习医疗视觉表示已经取得了显著的进展。然而,尽管其具有令人鼓舞的性能,它仍然面临一些挑战,即局部对齐缺乏可解释性和临床相关性,以及图像报告对内部和外部表示学习不足。为解决这些问题,我们提出了一个解剖结构引导(ASG)框架。具体来说,我们将原始报告解析为三元组 <解剖结构,发现,存在>,并充分利用每个元素作为监督来增强表示学习。对于解剖结构,我们与放射科医生合作设计了一个自动解剖结构-句子对齐范式,将它们视为探索细粒度局部对齐的最小语义单位。对于发现和存在,我们将它们视为图像标签,应用图像标签识别解码器将图像特征与它们的相应标签关联,为对比学习构建软标签以提高不同图像报告对之间的语义关联。我们在两个下游任务上评估了所提出的ASG框架,包括五个公开基准。实验结果表明,我们的方法超越了最先进的方法。
https://arxiv.org/abs/2403.09294
Accurately predicting the survival rate of cancer patients is crucial for aiding clinicians in planning appropriate treatment, reducing cancer-related medical expenses, and significantly enhancing patients' quality of life. Multimodal prediction of cancer patient survival offers a more comprehensive and precise approach. However, existing methods still grapple with challenges related to missing multimodal data and information interaction within modalities. This paper introduces SELECTOR, a heterogeneous graph-aware network based on convolutional mask encoders for robust multimodal prediction of cancer patient survival. SELECTOR comprises feature edge reconstruction, convolutional mask encoder, feature cross-fusion, and multimodal survival prediction modules. Initially, we construct a multimodal heterogeneous graph and employ the meta-path method for feature edge reconstruction, ensuring comprehensive incorporation of feature information from graph edges and effective embedding of nodes. To mitigate the impact of missing features within the modality on prediction accuracy, we devised a convolutional masked autoencoder (CMAE) to process the heterogeneous graph post-feature reconstruction. Subsequently, the feature cross-fusion module facilitates communication between modalities, ensuring that output features encompass all features of the modality and relevant information from other modalities. Extensive experiments and analysis on six cancer datasets from TCGA demonstrate that our method significantly outperforms state-of-the-art methods in both modality-missing and intra-modality information-confirmed cases. Our codes are made available at this https URL.
准确预测癌症患者的生存率对医生规划适当的治疗、降低癌症相关医疗费用以及显著提高患者的生活质量至关重要。多模态预测癌症患者生存提供了一个更全面和精确的方法。然而,现有的方法仍然面临着与缺失多模态数据和信息在模态内相互作用相关的挑战。本文介绍了一个基于卷积掩码编码器的异构图感知网络SELECTOR,用于稳健的多模态预测癌症患者生存。SELECTOR包括特征边缘重构、卷积掩码编码器、特征跨融合和多模态生存预测模块。首先,我们构建了一个多模态异质图,并使用元路径方法进行特征边缘重构,确保全面包含图形边界的特征信息并有效地将节点嵌入。为了减轻模态内缺失特征对预测准确性的影响,我们设计了一个卷积掩码自编码器(CMAE)进行后特征重构。随后,特征跨融合模块促进模态之间的通信,确保输出特征涵盖模态的所有特征以及其他模态的相关信息。对TCGA中的六个癌症数据集的广泛实验和分析表明,我们的方法在模态缺失和模态内信息确认情况下显著超过了最先进的 methods。我们的代码可在此链接处获取:https://url.cn/
https://arxiv.org/abs/2403.09290
Automated segmentation proves to be a valuable tool in precisely detecting tumors within medical images. The accurate identification and segmentation of tumor types hold paramount importance in diagnosing, monitoring, and treating highly fatal brain tumors. The BraTS challenge serves as a platform for researchers to tackle this issue by participating in open challenges focused on tumor segmentation. This study outlines our methodology for segmenting tumors in the context of two distinct tasks from the BraTS 2023 challenge: Adult Glioma and Pediatric Tumors. Our approach leverages two encoder-decoder-based CNN models, namely SegResNet and MedNeXt, for segmenting three distinct subregions of tumors. We further introduce a set of robust postprocessing to improve the segmentation, especially for the newly introduced BraTS 2023 metrics. The specifics of our approach and comprehensive performance analyses are expounded upon in this work. Our proposed approach achieves third place in the BraTS 2023 Adult Glioma Segmentation Challenges with an average of 0.8313 and 36.38 Dice and HD95 scores on the test set, respectively.
自动分割证明在精确检测医学图像中的肿瘤方面是一个有价值的工具。正确识别和分割肿瘤类型在诊断、监测和治疗高度致命的脑肿瘤方面至关重要。BraTS挑战为研究人员提供了解决这个问题的平台,通过参与针对肿瘤分割的公开挑战。 本研究概述了我们对BraTS 2023挑战中肿瘤分割的算法方法:成人胶质瘤和儿童肿瘤。我们的方法基于两个编码器-解码器-基于CNN模型,即SegResNet和MedNeXt,用于分割肿瘤的三个不同亚区域。我们进一步引入了一组鲁棒的后处理,以提高分割,尤其是对于新引入的BraTS 2023指标。 在本研究中,我们详细介绍了我们的方法以及综合性能分析。我们的方法在BraTS 2023成人胶质瘤分割挑战中获得了第三名的成绩,平均值为0.8313和36.38 Dice和HD95分数。
https://arxiv.org/abs/2403.09262