The evaluation of segmentation performance is a common task in biomedical image analysis, with its importance emphasized in the recently released metrics selection guidelines and computing frameworks. To quantitatively evaluate the alignment of two segmentations, researchers commonly resort to counting metrics, such as the Dice similarity coefficient, or distance-based metrics, such as the Hausdorff distance, which are usually computed by publicly available open-source tools with an inherent assumption that these tools provide consistent results. In this study we questioned this assumption, and performed a systematic implementation analysis along with quantitative experiments on real-world clinical data to compare 11 open-source tools for distance-based metrics computation against our highly accurate mesh-based reference implementation. The results revealed that statistically significant differences among all open-source tools are both surprising and concerning, since they question the validity of existing studies. Besides identifying the main sources of variation, we also provide recommendations for distance-based metrics computation.
翻译:分割性能评估是生物医学图像分析中一个常见任务,这在最近发布的指标选择指南和计算框架中得到了强调。为了定量评估两个分割的对齐度,研究人员通常采用计数指标,如Dice相似度系数,或基于距离的指标,如Hausdorff距离,这些指标通常由公开可用的开源工具计算,并具有固有假设,即这些工具提供一致的结果。在本研究中,我们质疑了这一假设,并与现实世界的临床数据一起进行了系统性的实现分析,以比较11个基于距离的指标计算开放源工具与我们的高度准确的网格基础参考实现之间的差异。结果显示,所有开源工具之间的统计学显著差异都是令人惊讶且令人担忧的,因为它们质疑了现有研究的有效性。除了确定主要变化来源外,我们还提供了基于距离的指标计算的建议。
https://arxiv.org/abs/2410.02630
We introduce ColaCare, a framework that enhances Electronic Health Record (EHR) modeling through multi-agent collaboration driven by Large Language Models (LLMs). Our approach seamlessly integrates domain-specific expert models with LLMs to bridge the gap between structured EHR data and text-based reasoning. Inspired by clinical consultations, ColaCare employs two types of agents: DoctorAgent and MetaAgent, which collaboratively analyze patient data. Expert models process and generate predictions from numerical EHR data, while LLM agents produce reasoning references and decision-making reports within the collaborative consultation framework. We additionally incorporate the Merck Manual of Diagnosis and Therapy (MSD) medical guideline within a retrieval-augmented generation (RAG) module for authoritative evidence support. Extensive experiments conducted on four distinct EHR datasets demonstrate ColaCare's superior performance in mortality prediction tasks, underscoring its potential to revolutionize clinical decision support systems and advance personalized precision medicine. The code, complete prompt templates, more case studies, etc. are publicly available at the anonymous link: this https URL.
我们提出了 ColaCare,这是一个通过大型语言模型(LLMs)驱动的多代理合作来增强电子病历(EHR)模型的框架。我们的方法无缝地将领域特定的专家模型与LLM相结合,从而将结构化EHR数据与基于文本的推理之间的差距缩小。受到临床会诊的启发,ColaCare采用两种代理:医生代理和元代理,它们在合作分析患者数据。专家模型处理并生成从数值EHR数据中的预测,而LLM代理在合作咨询框架中产生推理参考和决策报告。此外,我们在检索增强生成(RAG)模块中引入了梅克手册(MSD)医疗指南,为权威证据支持提供支持。在四个不同的EHR数据集上进行的广泛实验证明,ColaCare在死亡预测任务中的卓越表现,凸显了其可能彻底颠覆临床决策支持系统和推动个性化精准医学发展的潜力。代码,完整提示模板,更多病例 studies等公开可得,链接:https://匿名链接。
https://arxiv.org/abs/2410.02551
Deformable image registration is crucial for aligning medical images in a non-linear fashion across different modalities, allowing for precise spatial correspondence between varying anatomical structures. This paper presents NestedMorph, a novel network utilizing a Nested Attention Fusion approach to improve intra-subject deformable registration between T1-weighted (T1w) MRI and diffusion MRI (dMRI) data. NestedMorph integrates high-resolution spatial details from an encoder with semantic information from a decoder using a multi-scale framework, enhancing both local and global feature extraction. Our model notably outperforms existing methods, including CNN-based approaches like VoxelMorph, MIDIR, and CycleMorph, as well as Transformer-based models such as TransMorph and ViT-V-Net, and traditional techniques like NiftyReg and SyN. Evaluations on the HCP dataset demonstrate that NestedMorph achieves superior performance across key metrics, including SSIM, HD95, and SDlogJ, with the highest SSIM of 0.89, and the lowest HD95 of 2.5 and SDlogJ of 0.22. These results highlight NestedMorph's ability to capture both local and global image features effectively, leading to superior registration performance. The promising outcomes of this study underscore NestedMorph's potential to significantly advance deformable medical image registration, providing a robust framework for future research and clinical applications. The source code and our implementation are available at: this https URL
非线性图像配准对于在不同模态之间对医疗图像进行对齐至关重要,实现不同解剖结构的准确空间对应。本文介绍了一种新网络NestedMorph,利用Nested Attention Fusion方法在T1加权(T1w)MRI和扩散加权MRI(dMRI)数据之间进行自适应对齐。NestedMorph通过多尺度框架将编码器中高分辨率的空间信息与解码器中语义信息相结合,增强局部和全局特征提取。我们的模型在包括CNN基于方法(如VoxelMorph、MIDIR和CycleMorph)以及Transformer基于方法(如TransMorph和ViT-V-Net)以及传统方法(如NiftyReg和SyN)的基础上,显著优于现有方法。在HCP数据集上的评估表明,NestedMorph在关键指标,包括SSIM、HD95和SDlogJ上取得了卓越的性能,SSIM为0.89,HD95为2.5,SDlogJ为0.22。这些结果强调了NestedMorph有效捕捉局部和全局图像特征的能力,从而实现卓越的配准性能。本研究的结果表明,NestedMorph有很大的潜力显著改进非线性图像配准,为未来的研究和临床应用提供了一个稳健的框架。源代码和我们的实现可以在以下这个链接中获得:https://this URL
https://arxiv.org/abs/2410.02550
Large Language Models (LLMs), known for their versatility in textual data, are increasingly being explored for their potential to enhance medical image segmentation, a crucial task for accurate diagnostic imaging. This study explores enhancing Vision Transformers (ViTs) for medical image segmentation by integrating pre-trained LLM transformer blocks. Our approach, which incorporates a frozen LLM transformer block into the encoder of a ViT-based model, leads to substantial improvements in segmentation performance across various medical imaging modalities. We propose a Hybrid Attention Mechanism that combines global and local feature learning with a Multi-Scale Fusion Block for aggregating features across different scales. The enhanced model shows significant performance gains, including an average Dice score increase from 0.74 to 0.79 and improvements in accuracy, precision, and the Jaccard Index. These results demonstrate the effectiveness of LLM-based transformers in refining medical image segmentation, highlighting their potential to significantly boost model accuracy and robustness. The source code and our implementation are available at: this https URL
大语言模型(LLMs)因其在文本数据中的多才多艺而受到越来越多的关注,它们越来越被探索用于提高医学图像分割的潜力,这是准确诊断成像的关键任务。本研究探讨通过将预训练LLM变换器模块集成到基于ViT的模型的编码器中,提高医学图像分割性能的方法。我们的方法是将冻结的LLM变换器模块融入ViT模型的编码器中,从而在各种医学成像模式上显著提高分割性能。我们提出了一种混合注意机制,结合全局和局部特征学习以及多尺度融合块来聚合不同尺度上的特征。增强后的模型显示出显著的性能提升,包括平均Dice得分从0.74增加到0.79以及准确率、精确率和Jaccard指数的改善。这些结果证明了基于LLM的变换器在优化医学图像分割方面的有效性,突出了它们在提高模型准确性和鲁棒性方面的巨大潜力。源代码和我们的实现可以在以下链接中找到:https://this URL
https://arxiv.org/abs/2410.02458
Medical image segmentation plays a crucial role in clinical diagnosis and treatment planning. Although models based on convolutional neural networks (CNNs) and Transformers have achieved remarkable success in medical image segmentation tasks, they still face challenges such as high computational complexity and the loss of local features when capturing long-range dependencies. To address these limitations, we propose Med-TTT, a visual backbone network integrated with Test-Time Training (TTT) layers, which incorporates dynamic adjustment capabilities. Med-TTT introduces the Vision-TTT layer, which enables effective modeling of long-range dependencies with linear computational complexity and adaptive parameter adjustment during inference. Furthermore, we designed a multi-resolution fusion mechanism to combine image features at different scales, facilitating the identification of subtle lesion characteristics in complex backgrounds. At the same time, we adopt a frequency domain feature enhancement strategy based on high pass filtering, which can better capture texture and fine-grained details in images. Experimental results demonstrate that Med-TTT significantly outperforms existing methods on multiple medical image datasets, exhibiting strong segmentation capabilities, particularly in complex image backgrounds. The model achieves leading performance in terms of accuracy, sensitivity, and Dice coefficient, providing an efficient and robust solution for the field of medical image this http URL code is available at this https URL .
医学图像分割在临床诊断和治疗规划中发挥着关键作用。虽然基于卷积神经网络(CNN)和Transformer的模型在医学图像分割任务中取得了显著的成功,但它们仍然面临着一些挑战,如高计算复杂度以及在捕捉长距离依赖时丢失局部特征。为了应对这些局限,我们提出了Med-TTT,一种将Test-Time Training(TTT)层与视觉骨干网络集成在一起的模型,具有动态调整功能。Med-TTT引入了Vision-TTT层,在保持线性计算复杂度的同时,在推理过程中有效建模长距离依赖。此外,我们还设计了一个多分辨率融合机制,将不同尺度下的图像特征进行组合,有助于在复杂背景下识别微妙病变特征。同时,我们采用基于高斯滤波器的频域特征增强策略,可以更好地捕捉图像中的纹理和细小细节。实验结果表明,Med-TTT在多个医学图像数据集上显著优于现有方法,具有强大的分割能力,特别是在复杂图像背景下。该模型在准确率、敏感度和Dice系数方面均取得领先地位,为医学图像分割领域提供了一种高效且可靠的解决方案。您可以在此处访问该模型的原始论文:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7222125/。
https://arxiv.org/abs/2410.02523
Data augmentation, a cornerstone technique in deep learning, is crucial in enhancing model performance, especially with scarce labeled data. While traditional techniques are effective, their reliance on hand-crafted methods limits their applicability across diverse data types and tasks. Although modern learnable augmentation methods offer increased adaptability, they are computationally expensive and challenging to incorporate within prevalent augmentation workflows. In this work, we present a novel, efficient method for data augmentation, effectively bridging the gap between existing augmentation strategies and emerging datasets and learning tasks. We introduce SAFLEX (Self-Adaptive Augmentation via Feature Label EXtrapolation), which learns the sample weights and soft labels of augmented samples provided by any given upstream augmentation pipeline, using a specifically designed efficient bilevel optimization algorithm. Remarkably, SAFLEX effectively reduces the noise and label errors of the upstream augmentation pipeline with a marginal computational cost. As a versatile module, SAFLEX excels across diverse datasets, including natural and medical images and tabular data, showcasing its prowess in few-shot learning and out-of-distribution generalization. SAFLEX seamlessly integrates with common augmentation strategies like RandAug, CutMix, and those from large pre-trained generative models like stable diffusion and is also compatible with frameworks such as CLIP's fine-tuning. Our findings highlight the potential to adapt existing augmentation pipelines for new data types and tasks, signaling a move towards more adaptable and resilient training frameworks.
数据增强是深度学习中的一个关键技术,尤其是在稀疏标注数据的情况下,它有助于提高模型性能。虽然传统的技术很有效,但它们依赖于手工定制的方法,因此其对不同数据类型和任务的适用性有限。尽管现代可学习增强方法提供了更大的适应性,但它们在计算上昂贵,且难以融入现有的增强工作流程。在这项工作中,我们提出了一个新颖、高效的数据增强方法,有效地将现有的增强策略与新兴数据和任务之间的差距弥合。我们引入了SAFLEX(自适应增强通过特征标签扩展),它使用一个专门设计的有效双层优化算法来学习任何给定上游增强管道的增强样本的样本权重和软标签。值得注意的是,SAFLEX有效地降低了上游增强管道的噪声和标签错误,且计算成本微不足道。作为一款多功能的模块,SAFLEX在各种数据集上表现出色,包括自然和医学图像以及表格数据,展示了其在少量样本学习和离散分布泛化方面的卓越能力。SAFLEX轻松地与常见的增强策略(如RandAug、CutMix)以及像Stable Diffusion这样的大规模预训练生成模型兼容,也兼容CLIP的微调框架。我们的研究结果强调了为新的数据类型和任务调整现有增强管道的重要性,表明朝着更加适应性和弹性的训练框架的方向发展。
https://arxiv.org/abs/2410.02512
Mamba, a special case of the State Space Model, is gaining popularity as an alternative to template-based deep learning approaches in medical image analysis. While transformers are powerful architectures, they have drawbacks, including quadratic computational complexity and an inability to address long-range dependencies efficiently. This limitation affects the analysis of large and complex datasets in medical imaging, where there are many spatial and temporal relationships. In contrast, Mamba offers benefits that make it well-suited for medical image analysis. It has linear time complexity, which is a significant improvement over transformers. Mamba processes longer sequences without attention mechanisms, enabling faster inference and requiring less memory. Mamba also demonstrates strong performance in merging multimodal data, improving diagnosis accuracy and patient outcomes. The organization of this paper allows readers to appreciate the capabilities of Mamba in medical imaging step by step. We begin by defining core concepts of SSMs and models, including S4, S5, and S6, followed by an exploration of Mamba architectures such as pure Mamba, U-Net variants, and hybrid models with convolutional neural networks, transformers, and Graph Neural Networks. We also cover Mamba optimizations, techniques and adaptations, scanning, datasets, applications, experimental results, and conclude with its challenges and future directions in medical imaging. This review aims to demonstrate the transformative potential of Mamba in overcoming existing barriers within medical imaging while paving the way for innovative advancements in the field. A comprehensive list of Mamba architectures applied in the medical field, reviewed in this work, is available at Github.
Mamba,一种 State Space Model 的特殊情况,正在成为医学图像分析中模板为基础的深度学习方法的替代品。尽管 Transformer 是一种强大的架构,但它们存在一些局限性,包括二次计算复杂性和无法有效地解决长距离依赖问题。这种局限性影响到医疗影像大数据的分析,其中存在许多空间和时间关系。相比之下,Mamba 提供了在医学图像分析中具有优势的益处。它具有线性时间复杂性,这是 Transformer 的重大改进。Mamba 在没有注意力机制的情况下处理较长的序列,实现更快的推理并需要更少的内存。Mamba 还展示了在合并多模态数据方面的强大性能,提高诊断准确性和患者 outcomes。本文的组织使读者能够逐步了解 Mamba 在医学影像分析中的能力。我们首先定义了 State Space Model 和模型的核心概念,包括 S4、S5 和 S6,接着探讨了 Mamba 的架构,如纯 Mamba、U-Net 变体和具有卷积神经网络、Transformer 和 Graph Neural Networks 的混合模型。我们还涵盖了 Mamba 的优化、技术和适应性,扫描、数据集、应用、实验结果,并最后结论与挑战及未来在医学影像领域的发展趋势。本综述旨在展示 Mamba 在克服现有医疗影像工作中的障碍的同时,为该领域推动创新进展奠定基础。本工作中回顾了在医学领域应用的 Mamba 架构的完整列表,可在 Github 上查看。
https://arxiv.org/abs/2410.02362
The increasing demand for transparent and reliable models, particularly in high-stakes decision-making areas such as medical image analysis, has led to the emergence of eXplainable Artificial Intelligence (XAI). Post-hoc XAI techniques, which aim to explain black-box models after training, have been controversial in recent works concerning their fidelity to the models' predictions. In contrast, Self-eXplainable AI (S-XAI) offers a compelling alternative by incorporating explainability directly into the training process of deep learning models. This approach allows models to generate inherent explanations that are closely aligned with their internal decision-making processes. Such enhanced transparency significantly supports the trustworthiness, robustness, and accountability of AI systems in real-world medical applications. To facilitate the development of S-XAI methods for medical image analysis, this survey presents an comprehensive review across various image modalities and clinical applications. It covers more than 200 papers from three key perspectives: 1) input explainability through the integration of explainable feature engineering and knowledge graph, 2) model explainability via attention-based learning, concept-based learning, and prototype-based learning, and 3) output explainability by providing counterfactual explanation and textual explanation. Additionally, this paper outlines the desired characteristics of explainability and existing evaluation methods for assessing explanation quality. Finally, it discusses the major challenges and future research directions in developing S-XAI for medical image analysis.
随着对透明和可靠模型的不断需求,尤其是在高风险决策领域,如医学图像分析,出现了可解释人工智能(XAI)。后验XAI技术,其旨在解释训练后模型的黑盒,近年来在评估其对模型预测的可靠性方面引起了争议。相比之下,自解释人工智能(S-XAI)通过将可解释性直接融入深度学习模型的训练过程,提供了一种引人注目的解决方案。这种方法使得模型能够生成与其内部决策过程密切相关的固有解释。这种增强的可解释性显著支持了人工智能系统在现实医学应用中的可信度、稳健性和问责制。为了促进医疗图像分析中S-XAI方法的发展,这项调查对各种图像模式和临床应用进行了全面的回顾。它涵盖了三个关键观点:1)通过将可解释性特征工程和知识图谱集成到输入中进行解释性,2)通过关注式学习、概念学习和原型学习实现模型的可解释性,3)通过提供反事实解释和文本解释实现输出可解释性。此外,本文还概述了可解释性和现有评估方法评估解释质量的期望特征。最后,本文讨论了开发S-XAI用于医学图像分析的主要挑战和未来研究方向。
https://arxiv.org/abs/2410.02331
Medical image analysis tasks often focus on regions or structures located in a particular location within the patient's body. Often large parts of the image may not be of interest for the image analysis task. When using deep-learning based approaches, this causes an unnecessary increases the computational burden during inference and raises the chance of errors. In this paper, we introduce CTARR, a novel generic method for CT Anatomical Region Recognition. The method serves as a pre-processing step for any deep learning-based CT image analysis pipeline by automatically identifying the pre-defined anatomical region that is relevant for the follow-up task and removing the rest. It can be used in (i) image segmentation to prevent false positives in anatomically implausible regions and speeding up the inference, (ii) image classification to produce image crops that are consistent in their anatomical context, and (iii) image registration by serving as a fast pre-registration step. Our proposed method is based on atlas registration and provides a fast and robust way to crop any anatomical region encoded as one or multiple bounding box(es) from any unlabeled CT scan of the brain, chest, abdomen and/or pelvis. We demonstrate the utility and robustness of the proposed method in the context of medical image segmentation by evaluating it on six datasets of public segmentation challenges. The foreground voxels in the regions of interest are preserved in the vast majority of cases and tasks (97.45-100%) while taking only fractions of a seconds to compute (0.1-0.21s) on a deep learning workstation and greatly reducing the segmentation runtime (2.0-12.7x). Our code is available at this https URL.
医学图像分析任务通常集中于患者体内特定位置的部位或结构。通常,图像中很大一部分内容可能不适用于图像分析任务。当使用基于深度学习的图像处理方法时,这会导致在推理过程中计算负担的无谓增加,并增加错误的可能性。在本文中,我们介绍了一种名为CTARR的新通用方法,用于CT解剖区域识别。该方法作为任何基于深度学习的CT图像分析流程的预处理步骤,通过自动识别预定义的解剖区域,对后续任务相关区域进行去除。它可用于(i)图像分割,以防止在解剖上不相关区域出现假阳性结果并加快推理速度,(ii)图像分类,以生成具有相同解剖上下文的图像片段,和(iii)图像配准,作为快速预注册步骤。我们所提出的方法基于解剖配准,提供了一种快速且鲁棒的方法,从任何未标记的CT扫描中裁剪出任何编码为一个或多个边界框的解剖区域。我们在公共分割挑战数据集上评估所提出方法的应用价值。在大多数情况下,感兴趣区域的前景像素得以保留(97.45-100%),而仅需花费分数秒钟在深度学习工作站上计算(0.1-0.21s),大大减少了分割时间(2.0-12.7x)。我们的代码可在此https://url上获取。
https://arxiv.org/abs/2410.02316
Convolutional neural networks (CNNs) have shown great effectiveness in medical image segmentation. However, they may be limited in modeling large inter-subject variations in organ shapes and sizes and exploiting global long-range contextual information. This is because CNNs typically employ convolutions with fixed-sized local receptive fields and lack the mechanisms to utilize global information. To address these limitations, we developed Dynamic Multi-Resolution Convolution (DMRC) and Dynamic Multi-Scale Convolution (DMSC) modules. Both modules enhance the representation capabilities of single convolutions to capture varying scaled features and global contextual information. This is achieved in the DMRC module by employing a convolutional filter on images with different resolutions and subsequently utilizing dynamic mechanisms to model global inter-dependencies between features. In contrast, the DMSC module extracts features at different scales by employing convolutions with different kernel sizes and utilizing dynamic mechanisms to extract global contextual information. The utilization of convolutions with different kernel sizes in the DMSC module may increase computational complexity. To lessen this burden, we propose to use a lightweight design for convolution layers with a large kernel size. Thus, DMSC and DMRC modules are designed as lightweight drop-in replacements for single convolutions, and they can be easily integrated into general CNN architectures for end-to-end training. The segmentation network was proposed by incorporating our DMSC and DMRC modules into a standard U-Net architecture, termed Dynamic Multi-scale and Multi-resolution Convolution network (DMC-Net). The results demonstrate that our proposed DMSC and DMRC can enhance the representation capabilities of single convolutions and improve segmentation accuracy.
卷积神经网络(CNNs)在医学图像分割方面表现出了巨大的效果。然而,它们可能在大器官形状和大小的建模和利用全局长距离上下文信息方面受到限制。这是因为CNNs通常采用具有固定大小局部感受野的卷积操作,并缺乏利用全局信息的机制。为了应对这些限制,我们开发了动态多分辨率卷积(DMRC)和动态多尺度卷积(DMSC)模块。这两个模块通过在具有不同分辨率的图像上采用卷积操作来增强单卷积的表示能力,并利用动态机制建模特征之间的全局依赖关系。相比之下,DMSC模块通过采用不同尺寸的卷积操作来提取不同尺度的特征,并利用动态机制提取全局上下文信息。在DMSC模块中使用不同尺寸的卷积操作可能会增加计算复杂度。为了减轻这种负担,我们提出了一个适用于大卷积层的重量轻设计。因此,DMRC和DMSC模块被设计为轻量级的可插拔替代方案,可以轻松地集成到一般CNN架构中进行端到端的训练。我们提出的DMSC和DMC-Net架构将DMSC和DMSC模块集成到一个标准的U-Net架构中,称为动态多尺度多分辨率卷积网络(DMC-Net)。结果表明,我们的DMSC和DMC可以增强单卷积的表示能力,并提高分割精度。
https://arxiv.org/abs/2410.02129
Quantum Machine Learning (QML) is a red-hot field that brings novel discoveries and exciting opportunities to resolve, speed up, or refine the analysis of a wide range of computational problems. In the realm of biomedical research and personalized medicine, the significance of multi-omics integration lies in its ability to provide a thorough and holistic comprehension of complex biological systems. This technology links fundamental research to clinical practice. The insights gained from integrated omics data can be translated into clinical tools for diagnosis, prognosis, and treatment planning. The fusion of quantum computing and machine learning holds promise for unraveling complex patterns within multi-omics datasets, providing unprecedented insights into the molecular landscape of lung cancer. Due to the heterogeneity, complexity, and high dimensionality of multi-omic cancer data, characterized by the vast number of features (such as gene expression, micro-RNA, and DNA methylation) relative to the limited number of lung cancer patient samples, our prime motivation for this paper is the integration of multi-omic data, unique feature selection, and diagnostic classification of lung subtypes: lung squamous cell carcinoma (LUSC-I) and lung adenocarcinoma (LUAD-II) using quantum machine learning. We developed a method for finding the best differentiating features between LUAD and LUSC datasets, which has the potential for biomarker discovery.
量子机器学习(QML)是一个热门的领域,为解决、加速或改进各种计算问题的全新发现和令人兴奋的机会带来了量子计算。在生物医学研究和个性化医疗领域,多组学整合的重要性在于其能提供对复杂生物系统的全面和综合理解。这项技术将基础研究链接到临床实践。通过整合多组学数据获得的洞察可以转化为用于诊断、预后和治疗规划的临床工具。将量子计算与机器学习相结合有望揭示多组学数据中的复杂模式,为肺癌分子图谱提供前所未有的见解。 由于多组学癌症数据具有异质性、复杂性和高维度,即相对于肺癌患者样本数量有限的特点,我们论文的主要动机是整合多组学数据、独特的特征选择和肺癌亚型的诊断分类:鳞状细胞癌(LUSC-I)和腺癌(LUAD-II)。我们开发了一种用于在LUAD和LUSC数据集之间找到最佳区分特征的方法,该方法具有发现生物标志物的潜力。
https://arxiv.org/abs/2410.02085
Large language models (LLMs) have demonstrated remarkable progress in healthcare. However, a significant gap remains regarding LLMs' professionalism in domain-specific clinical practices, limiting their application in real-world diagnostics. In this work, we introduce ZODIAC, an LLM-powered framework with cardiologist-level professionalism designed to engage LLMs in cardiological diagnostics. ZODIAC assists cardiologists by extracting clinically relevant characteristics from patient data, detecting significant arrhythmias, and generating preliminary reports for the review and refinement by cardiologists. To achieve cardiologist-level professionalism, ZODIAC is built on a multi-agent collaboration framework, enabling the processing of patient data across multiple modalities. Each LLM agent is fine-tuned using real-world patient data adjudicated by cardiologists, reinforcing the model's professionalism. ZODIAC undergoes rigorous clinical validation with independent cardiologists, evaluated across eight metrics that measure clinical effectiveness and address security concerns. Results show that ZODIAC outperforms industry-leading models, including OpenAI's GPT-4o, Meta's Llama-3.1-405B, and Google's Gemini-pro, as well as medical-specialist LLMs like Microsoft's BioGPT. ZODIAC demonstrates the transformative potential of specialized LLMs in healthcare by delivering domain-specific solutions that meet the stringent demands of medical practice. Notably, ZODIAC has been successfully integrated into electrocardiography (ECG) devices, exemplifying the growing trend of embedding LLMs into Software-as-Medical-Device (SaMD).
大型语言模型(LLMs)在医疗领域取得了显著的进步。然而,在LLMs在领域特定临床实践的专业性方面,仍存在显著的差距,这限制了其在现实世界诊断中的应用。在这项工作中,我们引入了ZODIAC,一个由LLM驱动的框架,专为cardiologist级别的专业性设计,以激发LLMs在cardiological诊断中的应用。ZODIAC通过从患者数据中提取临床相关特征、检测显著的心律失常并生成初步报告,协助cardiologists。为了实现cardiologist级别的专业性,ZODIAC基于多代理合作框架构建,从而处理多模态的病人数据。每个LLM代理器都通过使用由cardiologists审核的实时病人数据进行微调,加强了模型的专业性。ZODIAC在独立 cardiologists的监督下进行严格的临床验证,并评估其临床有效性指标和安全漏洞。结果表明,ZODIAC超过了包括OpenAI的GPT-4o、meta的Llama-3.1-405B和Google的Gemini-pro在内的领先工业模型,以及医疗专家模型的Microsoft的BioGPT。ZODIAC通过提供满足医疗实践严格要求的领域特定解决方案,展示了专门LLM在医疗保健中的 transformative潜力。值得注意的是,ZODIAC已经成功地融入了心电图(ECG)设备中,体现了将LLM嵌入软件作为医疗设备(SaMD)的趋势。
https://arxiv.org/abs/2410.02026
Long-tailed learning is considered to be an extremely challenging problem in data imbalance learning. It aims to train well-generalized models from a large number of images that follow a long-tailed class distribution. In the medical field, many diagnostic imaging exams such as dermoscopy and chest radiography yield a long-tailed distribution of complex clinical findings. Recently, long-tailed learning in medical image analysis has garnered significant attention. However, the field currently lacks a unified, strictly formulated, and comprehensive benchmark, which often leads to unfair comparisons and inconclusive results. To help the community improve the evaluation and advance, we build a unified, well-structured codebase called Medical OpeN-source Long-taIled ClassifiCAtion (MONICA), which implements over 30 methods developed in relevant fields and evaluated on 12 long-tailed medical datasets covering 6 medical domains. Our work provides valuable practical guidance and insights for the field, offering detailed analysis and discussion on the effectiveness of individual components within the inbuilt state-of-the-art methodologies. We hope this codebase serves as a comprehensive and reproducible benchmark, encouraging further advancements in long-tailed medical image learning. The codebase is publicly available on this https URL.
Long-tailed learning is considered to be an extremely challenging problem in data imbalance learning.它旨在从大量遵循长尾类分布的图像中训练出具有良好泛化能力的模型。在医学领域,许多诊断成像检查如皮肤活检和胸部X光摄影产生具有复杂临床发现的长期尾分布。近年来,长尾学习在医学图像分析领域引起了广泛关注。然而,目前该领域缺乏一个统一、严格定理化和全面的基准,这往往导致不公平的比较和不可靠的结果。为了帮助社区提高评估和进步,我们构建了一个名为医疗OpeN-source Long-taIled ClassifiCAtion(MONICA)的统一、结构化的代码库,该库实现了相关领域中开发超过30个方法,并在12个长期尾医学数据集上进行了评估。我们的工作为该领域提供了宝贵的实际指导和见解,详细分析了内置最先进方法论中各个组件的有效性。我们希望这个代码库将成为一个全面和可重复的基准,鼓励在长尾医学图像学习方面进一步取得进展。该代码库可在https://这个https URL上公开使用。
https://arxiv.org/abs/2410.02010
Predicting phenotypes with complex genetic bases based on a small, interpretable set of variant features remains a challenging task. Conventionally, data-driven approaches are utilized for this task, yet the high dimensional nature of genotype data makes the analysis and prediction difficult. Motivated by the extensive knowledge encoded in pre-trained LLMs and their success in processing complex biomedical concepts, we set to examine the ability of LLMs in feature selection and engineering for tabular genotype data, with a novel knowledge-driven framework. We develop FREEFORM, Free-flow Reasoning and Ensembling for Enhanced Feature Output and Robust Modeling, designed with chain-of-thought and ensembling principles, to select and engineer features with the intrinsic knowledge of LLMs. Evaluated on two distinct genotype-phenotype datasets, genetic ancestry and hereditary hearing loss, we find this framework outperforms several data-driven methods, particularly on low-shot regimes. FREEFORM is available as open-source framework at GitHub: this https URL.
基于一小部分可解释的变异特征预测表型 remains a challenging 任务。通常,数据驱动方法用于这个任务,然而高维基因型数据的复杂性使得分析和预测变得困难。受到预训练LLM所蕴含的广泛知识的启发,以及它们成功处理复杂生物医学概念的成功,我们研究了LLM在特征选择和工程方面的能力,并采用一种新颖的知识驱动框架。我们开发了FREEFORM,Free-flow Reasoning和Ensembling for Enhanced Feature Output and Robust Modeling,设计时考虑到了链式思维和集成原则,以选择和工程具有LLM固有知识特征的表型。在两个不同的表型-基因型数据集(遗传学和遗传性听力损失)上进行评估,我们发现这个框架在低 shot-level 上表现优于几个数据驱动方法,尤其是在低shot-level上。FREEFORM 已作为开源框架在GitHub上发布:这是https://github.com/。
https://arxiv.org/abs/2410.01795
LLMs are ideal for decision-making due to their ability to reason over long contexts and identify critical factors. However, challenges arise when processing transcripts of spoken speech describing complex scenarios. These transcripts often contain ungrammatical or incomplete sentences, repetitions, hedging, and vagueness. For example, during a company's earnings call, an executive might project a positive revenue outlook to reassure investors, despite significant uncertainty regarding future earnings. It is crucial for LLMs to incorporate this uncertainty systematically when making decisions. In this paper, we introduce DeFine, a new framework that constructs probabilistic factor profiles from complex scenarios. DeFine then integrates these profiles with analogical reasoning, leveraging insights from similar past experiences to guide LLMs in making critical decisions in novel situations. Our framework separates the tasks of quantifying uncertainty in complex scenarios and incorporating it into LLM decision-making. This approach is particularly useful in fields such as medical consultations, negotiations, and political debates, where making decisions under uncertainty is vital.
自然语言处理(LLMs)由于能够在长篇语境下进行推理并识别关键因素,非常适合进行决策。然而,在处理口头讲话描述复杂情景的转录时,挑战就会出现。这些转录通常包含不完整的句子、重复、含糊不清的表述。例如,在一家公司的财报电话会议上,一位高管可能会向投资者展示乐观的营收前景,尽管对未来收益的不确定性存在很大争议。对于LLMs来说,在做出决策时系统地纳入这种不确定性非常重要。在本文中,我们介绍了DeFine,一个新框架,它从复杂情景中构建概率因式概型。DeFine然后将这些概型与类比推理相结合,利用类似过去经历的见解来指导LLMs在新型情况下做出关键决策。我们的框架将复杂场景中不确定性的量化与LLM决策过程相结合。这种方法在医疗咨询、谈判和政治辩论等领域尤为有用,因为在不确定情况下做出决策至关重要。
https://arxiv.org/abs/2410.01772
Cardiac magnetic resonance imaging (CMR), considered the gold standard for noninvasive cardiac assessment, is a diverse and complex modality requiring a wide variety of image processing tasks for comprehensive assessment of cardiac morphology and function. Advances in deep learning have enabled the development of state-of-the-art (SoTA) models for these tasks. However, model training is challenging due to data and label scarcity, especially in the less common imaging sequences. Moreover, each model is often trained for a specific task, with no connection between related tasks. In this work, we introduce a vision foundation model trained for CMR assessment, that is trained in a self-supervised fashion on 36 million CMR images. We then finetune the model in supervised way for 9 clinical tasks typical to a CMR workflow, across classification, segmentation, landmark localization, and pathology detection. We demonstrate improved accuracy and robustness across all tasks, over a range of available labeled dataset sizes. We also demonstrate improved few-shot learning with fewer labeled samples, a common challenge in medical image analyses. We achieve an out-of-box performance comparable to SoTA for most clinical tasks. The proposed method thus presents a resource-efficient, unified framework for CMR assessment, with the potential to accelerate the development of deep learning-based solutions for image analysis tasks, even with few annotated data available.
心脏磁共振成像(CMR)被认为是非侵入性心脏评估的黄金标准,是一种具有多样性和复杂性的成像方式,需要进行广泛的图像处理任务才能全面评估心脏形态和功能。深度学习的进步使得为这些任务开发最先进的(SoTA)模型成为可能。然而,由于数据和标签稀少,尤其是在不太常见的成像序列中,模型训练变得具有挑战性。此外,每个模型通常针对特定的任务进行训练,而没有相关任务之间的联系。 在这项工作中,我们介绍了一个用于CMR评估的自监督视觉基础模型,该模型在3600万张CMR图像上进行训练。然后,我们以监督的方式微调模型,以应对9个临床任务的分类、分割、关键点定位和病理检测。我们证明了在所有任务上都有提高的准确性和稳健性,涵盖了可用标记数据集的大小范围。我们还证明了在几个标记样本的情况下,仅通过少量的标记样本实现 improved few-shot learning,这是医学图像分析中常见的一个挑战。我们在大多数临床任务上的表现与SoTA相当。因此,所提出的方法在CMR评估方面具有资源高效、统一框架,具有加速开发基于深度学习的图像分析任务的可能性,即使只有很少的标记数据可用。
https://arxiv.org/abs/2410.01665
Non-ideal measurement computed tomography (NICT), which sacrifices optimal imaging standards for new advantages in CT imaging, is expanding the clinical application scope of CT images. However, with the reduction of imaging standards, the image quality has also been reduced, extremely limiting the clinical acceptability. Although numerous studies have demonstrated the feasibility of deep learning for the NICT enhancement in specific scenarios, their high data cost and limited generalizability have become large obstacles. The recent research on the foundation model has brought new opportunities for building a universal NICT enhancement model - bridging the image quality degradation with minimal data cost. However, owing to the challenges in the collection of large pre-training datasets and the compatibility of data variation, no success has been reported. In this paper, we propose a multi-scale integrated Transformer AMPlifier (TAMP), the first imaging foundation model for universal NICT enhancement. It has been pre-trained on a large-scale physical-driven simulation dataset with 3.6 million NICT-ICT image pairs, and is able to directly generalize to the NICT enhancement tasks with various non-ideal settings and body regions. Via the adaptation with few data, it can further achieve professional performance in real-world specific scenarios. Our extensive experiments have demonstrated that the proposed TAMP has significant potential for promoting the exploration and application of NICT and serving a wider range of medical scenarios.
非理想测量计算断层扫描(NICT)是一种为了在CT成像中实现新优势而牺牲了最优成像标准的成像技术。这扩大了CT图像的临床应用范围。然而,降低成像标准也降低了图像质量,这使得临床可接受性受到极大限制。尽管大量研究表明,在特定场景下NICT增强是可行的,但它们的高数据成本和有限的泛化能力已经成为巨大的障碍。最近关于基础模型的研究带来了新的机会来构建一个通用的NICT增强模型 - 利用最小的数据成本来抵消图像质量的降低。然而,由于收集大型预训练数据集的挑战以及数据变异的兼容性,还没有报道成功。在本文中,我们提出了一个多尺度集成Transformer AMPlifier(TAMP),这是第一个通用NICT增强模型。它在大规模物理驱动模拟数据集上进行了预训练,该数据集包含360,000个NICT-ICT图像对,并且能够直接泛化到各种非理想设置和身体部位的NICT增强任务。通过少量的数据适应,它可以在现实世界的特定场景中实现专业性能。我们的广泛实验证明,所提出的TAMP具有推动NICT探索与应用的重要潜力,服务于更广泛的医疗场景。
https://arxiv.org/abs/2410.01591
Quantitative magnetic resonance imaging (qMRI) offers tissue-specific physical parameters with significant potential for neuroscience research and clinical practice. However, lengthy scan times for 3D multiparametric qMRI acquisition limit its clinical utility. Here, we propose SUMMIT, an innovative imaging methodology that includes data acquisition and an unsupervised reconstruction for simultaneous multiparametric qMRI. SUMMIT first encodes multiple important quantitative properties into highly undersampled k-space. It further leverages implicit neural representation incorporated with a dedicated physics model to reconstruct the desired multiparametric maps without needing external training datasets. SUMMIT delivers co-registered T1, T2, T2*, and quantitative susceptibility mapping. Extensive simulations and phantom imaging demonstrate SUMMIT's high accuracy. Additionally, the proposed unsupervised approach for qMRI reconstruction also introduces a novel zero-shot learning paradigm for multiparametric imaging applicable to various medical imaging modalities.
定量磁共振成像(qMRI)为神经科学研究和临床实践提供了具有显著潜力的组织特异性物理参数。然而,长时间的3D多参数qMRI采集时间限制了其临床应用。在这里,我们提出了SUMMIT,一种创新成像方法,包括数据采集和无监督重构以实现同时多参数qMRI。SUMMIT首先将多个重要的定量属性编码为高度欠采样k空间中的多个重要属性。它还利用了与专用物理模型结合的隐式神经表示来重构所需的multiparametric张量,而不需要外部训练数据集。SUMMIT提供了共面的T1、T2、T2*和定量对比度张量。广泛的模拟和幻灯成像证明了SUMMIT的高准确性。此外,所提出的无监督qMRI重构方法还引入了一种适用于各种医学成像模式的零样本学习范式。
https://arxiv.org/abs/2410.01577
Test-time adaptation (TTA) has emerged as a promising paradigm to handle the domain shifts at test time for medical images from different institutions without using extra training data. However, existing TTA solutions for segmentation tasks suffer from (1) dependency on modifying the source training stage and access to source priors or (2) lack of emphasis on shape-related semantic knowledge that is crucial for segmentation this http URL research on visual prompt learning achieves source-relaxed adaptation by extended parameter space but still neglects the full utilization of semantic features, thus motivating our work on knowledge-enriched deep prompt learning. Beyond the general concern of image style shifts, we reveal that shape variability is another crucial factor causing the performance drop. To address this issue, we propose a TTA framework called PASS (Prompting to Adapt Styles and Semantic shapes), which jointly learns two types of prompts: the input-space prompt to reformulate the style of the test image to fit into the pretrained model and the semantic-aware prompts to bridge high-level shape discrepancy across domains. Instead of naively imposing a fixed prompt, we introduce an input decorator to generate the self-regulating visual prompt conditioned on the input data. To retrieve the knowledge representations and customize target-specific shape prompts for each test sample, we propose a cross-attention prompt modulator, which performs interaction between target representations and an enriched shape prompt bank. Extensive experiments demonstrate the superior performance of PASS over state-of-the-art methods on multiple medical image segmentation datasets. The code is available at this https URL.
测试时间适应(TTA)已成为处理在测试时间不同机构医学图像领域转移的的有前途的范式,而不使用额外的训练数据。然而,现有的TTA解决方案对于分割任务存在以下两个问题:(1)依赖于修改源训练阶段和访问源先验,或者(2)忽略形状相关的语义知识,这对分割至关重要。(1)在现有TTA解决方案中,我们使用扩展参数空间实现源松散适应,但仍然忽略了语义特征的充分利用,因此我们 motivated 我们的知识丰富的深度提示学习工作。(2)除了图像风格的转移之外,我们还发现形状多样性是导致性能下降的另一关键因素。为了应对这个问题,我们提出了一个TTA框架,称为PASS(提示适应样式和语义形状),它共同学习两种提示:输入空间提示来重新定义测试图像的风格以适应预训练模型,以及语义感知提示以解决领域间高级形状差异。我们不是简单地强制执行固定提示,而是引入了一个输入装饰器,根据输入数据生成自调节的视觉提示。为了检索知识表示并自定义针对每个测试样本的目标特定形状提示,我们提出了一个交叉注意提示调节器,它发生在目标表示和丰富形状提示库之间进行交互。大量实验证明,PASS在多个医学图像分割数据集上的表现优于最先进的 methods。代码可在此处下载:https://www.example.com/。
https://arxiv.org/abs/2410.01573
Artificial intelligence (AI) and large language models (LLMs) in healthcare require advanced clinical skills (CS), yet current benchmarks fail to evaluate these comprehensively. We introduce MedQA-CS, an AI-SCE framework inspired by medical education's Objective Structured Clinical Examinations (OSCEs), to address this gap. MedQA-CS evaluates LLMs through two instruction-following tasks, LLM-as-medical-student and LLM-as-CS-examiner, designed to reflect real clinical scenarios. Our contributions include developing MedQA-CS, a comprehensive evaluation framework with publicly available data and expert annotations, and providing the quantitative and qualitative assessment of LLMs as reliable judges in CS evaluation. Our experiments show that MedQA-CS is a more challenging benchmark for evaluating clinical skills than traditional multiple-choice QA benchmarks (e.g., MedQA). Combined with existing benchmarks, MedQA-CS enables a more comprehensive evaluation of LLMs' clinical capabilities for both open- and closed-source LLMs.
人工智能(AI)和大型语言模型(LLMs)在医疗保健领域需要高级临床技能(CS),然而目前的基准评估方法无法全面评估这些技能。我们介绍了一个名为MedQA-CS的AI-SCE框架,灵感来自医学教育的目标结构临床考试(OSCEs),以填补这一空白。MedQA-CS通过两个指令跟随任务对LLMs进行评估,LLM-as-medical-student和LLM-as-CS-examiner,旨在反映真实的临床场景。我们的贡献包括开发了MedQA-CS,一个公开可用数据和专家注释的全面评估框架,以及为LLMs在CS评估中作为可靠评估者的定量定性评估。我们的实验证明,MedQA-CS比传统的多选题QA基准(如MedQA)更具挑战性。与现有基准相结合,MedQA-CS使对LLMs的临床能力进行更全面的评估成为可能,无论是开源还是闭源的LLM。
https://arxiv.org/abs/2410.01553