Diffusion models for image generation have been a subject of increasing interest due to their ability to generate diverse, high-quality images. Image generation has immense potential in medical imaging because open-source medical images are difficult to obtain compared to natural images, especially for rare conditions. The generated images can be used later to train classification and segmentation models. In this paper, we propose simulating realistic ultrasound (US) images by successive fine-tuning of large diffusion models on different publicly available databases. To do so, we fine-tuned Stable Diffusion, a state-of-the-art latent diffusion model, on BUSI (Breast US Images) an ultrasound breast image dataset. We successfully generated high-quality US images of the breast using simple prompts that specify the organ and pathology, which appeared realistic to three experienced US scientists and a US radiologist. Additionally, we provided user control by conditioning the model with segmentations through ControlNet. We will release the source code at this http URL to allow fast US image generation to the scientific community.
扩散模型在图像生成领域引起了越来越多的关注,因为它们能够生成多样化且高质量的图像。在医学成像中,这种技术具有巨大的潜力,原因是开放源代码的医疗影像难以获取,尤其是在罕见疾病的情况下,相比之下自然图像是容易获得的。所生成的图像可以用于后续训练分类和分割模型。本文提出了一种方法,通过依次对不同公开数据库进行微调来模拟真实的超声(US)图像。为此,我们使用了Stable Diffusion——一种最先进的潜在扩散模型,并在其上微调了一个名为BUSI(乳腺超声影像集)的超声乳腺图象数据集。我们成功地使用简单提示生成了高质量的乳腺超声图像,这些提示仅指定了器官和病理性特征,且这些图像被三位经验丰富的US科学家和一位US放射科医生认为是真实的。此外,通过将模型与分割结合进行条件控制(使用ControlNet),我们提供了用户对生成过程的控制。我们将在这个网址发布源代码,以便科研社区可以快速生成超声影像。
https://arxiv.org/abs/2502.08580
The growing availability of longitudinal Magnetic Resonance Imaging (MRI) datasets has facilitated Artificial Intelligence (AI)-driven modeling of disease progression, making it possible to predict future medical scans for individual patients. However, despite significant advancements in AI, current methods continue to face challenges including achieving patient-specific individualization, ensuring spatiotemporal consistency, efficiently utilizing longitudinal data, and managing the substantial memory demands of 3D scans. To address these challenges, we propose Brain Latent Progression (BrLP), a novel spatiotemporal model designed to predict individual-level disease progression in 3D brain MRIs. The key contributions in BrLP are fourfold: (i) it operates in a small latent space, mitigating the computational challenges posed by high-dimensional imaging data; (ii) it explicitly integrates subject metadata to enhance the individualization of predictions; (iii) it incorporates prior knowledge of disease dynamics through an auxiliary model, facilitating the integration of longitudinal data; and (iv) it introduces the Latent Average Stabilization (LAS) algorithm, which (a) enforces spatiotemporal consistency in the predicted progression at inference time and (b) allows us to derive a measure of the uncertainty for the prediction. We train and evaluate BrLP on 11,730 T1-weighted (T1w) brain MRIs from 2,805 subjects and validate its generalizability on an external test set comprising 2,257 MRIs from 962 subjects. Our experiments compare BrLP-generated MRI scans with real follow-up MRIs, demonstrating state-of-the-art accuracy compared to existing methods. The code is publicly available at: this https URL.
随着纵向磁共振成像(MRI)数据集的日益普及,基于人工智能(AI)的疾病进展建模得到了促进,使得为个别患者预测未来的医学扫描成为可能。然而,尽管在AI领域取得了显著的进步,目前的方法仍然面临着一些挑战,包括实现以患者为中心的个性化、确保时空一致性、有效利用纵向数据以及管理3D扫描带来的巨大内存需求。为了应对这些挑战,我们提出了一种新颖的时空模型——大脑潜在进展(BrLP),旨在预测个体层面在3D脑MRI中的疾病进展。 BrLP的关键贡献有四点:(i) 它在一个较小的潜在空间中运行,从而减轻了高维影像数据带来的计算难题;(ii) 它明确整合了受试者元数据以增强预测的个性化;(iii) 通过辅助模型将疾病的动态知识纳入其中,促进了纵向数据的集成;(iv) 引入了潜在平均稳定化(LAS)算法,该算法(a)在推理时强制执行预测进展中的时空一致性,并(b)允许我们推导出预测不确定性的度量。 我们在2805名受试者的11,730张T1加权(T1w)脑MRI上训练并评估了BrLP,并通过962名受试者组成的2,257张外部测试集验证了其泛化能力。我们的实验将由BrLP生成的MRI扫描与真实的随访MRI进行了比较,证明了相对于现有方法而言达到了最先进的准确性。代码可在以下链接公开获取:this https URL.
https://arxiv.org/abs/2502.08560
Medical image segmentation remains a formidable challenge due to the label scarcity. Pre-training Vision Transformer (ViT) through masked image modeling (MIM) on large-scale unlabeled medical datasets presents a promising solution, providing both computational efficiency and model generalization for various downstream tasks. However, current ViT-based MIM pre-training frameworks predominantly emphasize local aggregation representations in output layers and fail to exploit the rich representations across different ViT layers that better capture fine-grained semantic information needed for more precise medical downstream tasks. To fill the above gap, we hereby present Hierarchical Encoder-driven MAE (Hi-End-MAE), a simple yet effective ViT-based pre-training solution, which centers on two key innovations: (1) Encoder-driven reconstruction, which encourages the encoder to learn more informative features to guide the reconstruction of masked patches; and (2) Hierarchical dense decoding, which implements a hierarchical decoding structure to capture rich representations across different layers. We pre-train Hi-End-MAE on a large-scale dataset of 10K CT scans and evaluated its performance across seven public medical image segmentation benchmarks. Extensive experiments demonstrate that Hi-End-MAE achieves superior transfer learning capabilities across various downstream tasks, revealing the potential of ViT in medical imaging applications. The code is available at: this https URL
医学图像分割由于标签稀缺而仍是一个巨大的挑战。通过掩码图像建模(MIM)在大规模未标记的医学数据集上对视觉变换器(Vision Transformer,ViT)进行预训练提供了一种有前景的解决方案,这不仅提高了计算效率,还增强了模型在各种下游任务中的泛化能力。然而,现有的基于ViT的MIM预训练框架主要关注输出层中的局部聚合表示,并未能充分利用不同ViT层之间的丰富表示形式,而这些表示对于更精确的医学下游任务所需捕捉精细语义信息至关重要。 为了解决上述问题,我们在此提出了一种简单且有效的基于ViT的预训练解决方案——分层编码器驱动MAE(Hi-End-MAE)。该方案集中于两个关键创新点: 1. 编码器驱动重构:这种方法鼓励编码器学习更有信息量的特征来指导被掩码补丁的重建,从而提高模型在处理医学图像时的细节捕捉能力。 2. 分层密集解码:通过实现分层解码结构,捕获不同层次之间的丰富表示形式,这对于理解复杂且多层次的医学数据特别有用。 我们在包含10,000个CT扫描的大规模数据集上对Hi-End-MAE进行了预训练,并对其在七个公开的医学图像分割基准测试中的性能进行了评估。广泛的实验表明,Hi-End-MAE展示了跨各种下游任务的卓越迁移学习能力,揭示了ViT在医学影像应用方面的潜力。 该研究的相关代码可在以下链接找到:[此URL](请将方括号内的文本替换为实际提供的链接)。
https://arxiv.org/abs/2502.08347
Accurate segmentation of all pathological findings in 3D medical images remains a significant challenge, as supervised models are limited to detecting only the few pathology classes annotated in existing datasets. To address this, we frame pathology segmentation as an unsupervised visual anomaly segmentation (UVAS) problem, leveraging the inherent rarity of pathological patterns compared to healthy ones. We enhance the existing density-based UVAS framework with two key innovations: (1) dense self-supervised learning (SSL) for feature extraction, eliminating the need for supervised pre-training, and (2) learned, masking-invariant dense features as conditioning variables, replacing hand-crafted positional encodings. Trained on over 30,000 unlabeled 3D CT volumes, our model, Screener, outperforms existing UVAS methods on four large-scale test datasets comprising 1,820 scans with diverse pathologies. Code and pre-trained models will be made publicly available.
在三维医学图像中准确分割所有病理发现仍然是一项重大挑战,因为监督模型仅限于检测现有数据集中标注的少数病理类型。为了解决这个问题,我们将病理分割视为一个无监督视觉异常分割(UVAS)问题,并利用病理模式与健康模式相比固有的稀有性。我们通过两个关键创新来增强现有的基于密度的UVAS框架:(1) 用于特征提取的密集自监督学习(SSL),消除了对监督预训练的需求;(2) 学习到的、不受掩码影响的密集特性作为条件变量,取代手工制作的位置编码。在超过30,000个未标记的三维CT体积上进行训练后,我们的模型Screener在四个包含1,820张扫描图像(涵盖各种病理情况)的大规模测试数据集上超越了现有的UVAS方法。代码和预训练模型将公开提供。
https://arxiv.org/abs/2502.08321
Precise classification of megakaryocytes is crucial for diagnosing myelodysplastic syndromes. Although self-supervised learning has shown promise in medical image analysis, its application to classifying megakaryocytes in stained slides faces three main challenges: (1) pervasive background noise that obscures cellular details, (2) a long-tailed distribution that limits data for rare subtypes, and (3) complex morphological variations leading to high intra-class variability. To address these issues, we propose the ActiveSSF framework, which integrates active learning with self-supervised pretraining. Specifically, our approach employs Gaussian filtering combined with K-means clustering and HSV analysis (augmented by clinical prior knowledge) for accurate region-of-interest extraction; an adaptive sample selection mechanism that dynamically adjusts similarity thresholds to mitigate class imbalance; and prototype clustering on labeled samples to overcome morphological complexity. Experimental results on clinical megakaryocyte datasets demonstrate that ActiveSSF not only achieves state-of-the-art performance but also significantly improves recognition accuracy for rare subtypes. Moreover, the integration of these advanced techniques further underscores the practical potential of ActiveSSF in clinical settings. To foster further research, the code and datasets will be publicly released in the future.
精确分类巨核细胞对于诊断骨髓增生异常综合征至关重要。尽管自我监督学习在医学图像分析中展现出潜力,但将其应用于染色切片中的巨核细胞分类面临三大挑战:(1)普遍存在背景噪声,掩盖了细胞细节;(2)长尾分布限制了罕见亚型的数据量;以及(3)复杂的形态变化导致高内的类内变异性。为解决这些问题,我们提出了ActiveSSF框架,该框架结合了主动学习和自我监督预训练方法。 具体而言,我们的方法采用高斯滤波与K-means聚类及HSV分析相结合的技术(通过临床先验知识增强),以实现精确的感兴趣区域提取;一种自适应样本选择机制,动态调整相似性阈值来缓解类别不平衡问题;以及基于标注样本原型聚类的方法,以此克服形态复杂性。在临床上用于巨核细胞的数据集上进行实验的结果表明,ActiveSSF不仅达到了最先进的性能水平,还显著提高了罕见亚型的识别精度。 此外,这些高级技术的整合进一步突显了ActiveSSF在临床环境中的实用潜力。为了促进进一步的研究,将在未来公开发布代码和数据集。
https://arxiv.org/abs/2502.08200
Cancers are characterized by remarkable heterogeneity and diverse prognosis. Accurate cancer classification is essential for patient stratification and clinical decision-making. Although digital pathology has been advancing cancer diagnosis and prognosis, the paradigm in cancer pathology has shifted from purely relying on histology features to incorporating molecular markers. There is an urgent need for digital pathology methods to meet the needs of the new paradigm. We introduce a novel digital pathology approach to jointly predict molecular markers and histology features and model their interactions for cancer classification. Firstly, to mitigate the challenge of cross-magnification information propagation, we propose a multi-scale disentangling module, enabling the extraction of multi-scale features from high-magnification (cellular-level) to low-magnification (tissue-level) whole slide images. Further, based on the multi-scale features, we propose an attention-based hierarchical multi-task multi-instance learning framework to simultaneously predict histology and molecular markers. Moreover, we propose a co-occurrence probability-based label correlation graph network to model the co-occurrence of molecular markers. Lastly, we design a cross-modal interaction module with the dynamic confidence constrain loss and a cross-modal gradient modulation strategy, to model the interactions of histology and molecular markers. Our experiments demonstrate that our method outperforms other state-of-the-art methods in classifying glioma, histology features and molecular markers. Our method promises to promote precise oncology with the potential to advance biomedical research and clinical applications. The code is available at this https URL
癌症具有显著的异质性和不同的预后。精确的癌症分类对于患者分层和临床决策至关重要。尽管数字病理学在癌症诊断和预后的进展中发挥了重要作用,但癌症病理学范式已经从单纯依赖组织学特征转向了结合分子标志物的方式。迫切需要数字病理方法来满足这种新范式的需要。我们介绍了一种新颖的数字病理方法,该方法能够同时预测分子标志物和组织学特征,并建模它们之间的相互作用以进行癌症分类。 首先,为了缓解跨放大倍率信息传播的挑战,我们提出了一个多尺度解构模块,使从高倍(细胞级)到低倍(组织级)全切片图像中提取多尺度特征成为可能。此外,在基于这些多尺度特征的基础上,我们提出了一个注意力机制驱动的层次化多任务多实例学习框架,以同时预测组织学和分子标志物。另外,我们提出了一种基于共现概率的标签关联图网络来建模分子标记物之间的共现关系。最后,我们设计了一个跨模式交互模块,并采用动态置信度约束损失及跨模式梯度调节策略,用以建模组织学与分子标志物间的相互作用。 我们的实验表明,在胶质瘤分类、组织学特征和分子标记的预测方面,本方法优于其他最先进的方法。该方法有望促进精准肿瘤医学的发展,并可能在生物医学研究和临床应用中取得进展。代码可在此链接获取:[此URL]
https://arxiv.org/abs/2502.07979
Medical research faces well-documented challenges in translating novel treatments into clinical practice. Publishing incentives encourage researchers to present "positive" findings, even when empirical results are equivocal. Consequently, it is well-documented that authors often spin study results, especially in article abstracts. Such spin can influence clinician interpretation of evidence and may affect patient care decisions. In this study, we ask whether the interpretation of trial results offered by Large Language Models (LLMs) is similarly affected by spin. This is important since LLMs are increasingly being used to trawl through and synthesize published medical evidence. We evaluated 22 LLMs and found that they are across the board more susceptible to spin than humans. They might also propagate spin into their outputs: We find evidence, e.g., that LLMs implicitly incorporate spin into plain language summaries that they generate. We also find, however, that LLMs are generally capable of recognizing spin, and can be prompted in a way to mitigate spin's impact on LLM outputs.
医学研究在将新疗法转化为临床实践方面面临着众所周知的挑战。出版激励机制鼓励研究人员即使实证结果模棱两可时,也倾向于呈现“积极”的发现。因此,作者常常会对研究成果进行曲解,尤其是在文章摘要中。这种曲解会影响医生对证据的理解,并可能影响患者的治疗决策。在这项研究中,我们探讨了大型语言模型(LLMs)在解释试验结果时是否也会受到这种曲解的影响。这一点非常重要,因为越来越多的LLMs被用于筛选和整合已发表的医学文献。 我们在22种不同的大型语言模型上进行了评估,发现它们比人类更容易受制于研究成果中的曲解现象。不仅如此,这些模型还可能在其输出中传播这种曲解:例如,我们发现了证据表明,LLM生成的平实语言总结会隐含地包含曲解信息。然而,我们也发现,LLMs通常能够识别出这种曲解,并可以通过特定提示的方式减少曲解对LLM输出的影响。 总之,这项研究表明大型语言模型在处理医学研究数据时可能会受到研究成果中的积极导向偏见(spin)影响,但同时也展示了通过适当的干预措施可以减轻这种效应。
https://arxiv.org/abs/2502.07963
Employing self-supervised learning (SSL) methodologies assumes par-amount significance in handling unlabeled polyp datasets when building deep learning-based automatic polyp segmentation models. However, the intricate privacy dynamics surrounding medical data often preclude seamless data sharing among disparate medical centers. Federated learning (FL) emerges as a formidable solution to this privacy conundrum, yet within the realm of FL, optimizing model generalization stands as a pressing imperative. Robust generalization capabilities are imperative to ensure the model's efficacy across diverse geographical domains post-training on localized client datasets. In this paper, a Federated self-supervised Domain Generalization method is proposed to enhance the generalization capacity of federated and Label-efficient intestinal polyp segmentation, named LFDG. Based on a classical SSL method, DropPos, LFDG proposes an adversarial learning-based data augmentation method (SSADA) to enhance the data diversity. LFDG further proposes a relaxation module based on Source-reconstruction and Augmentation-masking (SRAM) to maintain stability in feature learning. We have validated LFDG on polyp images from six medical centers. The performance of our method achieves 3.80% and 3.92% better than the baseline and other recent FL methods and SSL methods, respectively.
在基于深度学习的自动息肉分割模型构建过程中,面对未标记的数据集时,采用自我监督学习(Self-Supervised Learning, SSL)方法显得尤为重要。然而,医疗数据周围复杂的隐私动态往往阻碍了不同医疗机构之间的无缝数据共享。联邦学习(Federated Learning, FL)作为一种强有力的方法,能够解决这一隐私难题,但在FL中优化模型的泛化能力成为了一个紧迫的任务。强大的泛化能力对于确保模型在训练后基于本地客户数据集的情况下,在不同的地理区域仍然有效至关重要。 本文提出了一种名为LFDG的联邦自我监督领域泛化方法,旨在增强联邦和标签高效型结肠息肉分割的泛化性能。该方法基于经典的SSL方法DropPos,并提出了基于对抗学习的数据增强方法(SSADA)以提高数据多样性。此外,LFDG还引入了一个基于源重构与增强掩蔽(SRAM)的松弛模块来保持特征学习中的稳定性。 我们在来自六个医疗机构的息肉图像上验证了LFDG的效果。我们的方法相较于基线以及其他最近的FL和SSL方法分别取得了3.80%和3.92%的成绩提升。
https://arxiv.org/abs/2502.07951
Hypercomplex image processing extends conventional techniques in a unified paradigm encompassing algebraic and geometric principles. This work leverages quaternions and the two-dimensional orthogonal planes split framework (splitting of a quaternion - representing a pixel - into pairs of orthogonal 2D planes) for natural/biomedical image analysis through the following computational workflows and outcomes: natural/biomedical image re-colorization, natural image de-colorization, natural/biomedical image contrast enhancement, computational re-staining and stain separation in histological images, and performance gains in machine/deep learning pipelines for histological images. The workflows are analyzed separately for natural and biomedical images to showcase the effectiveness of the proposed approaches. The proposed workflows can regulate color appearance (e.g. with alternative renditions and grayscale conversion) and image contrast, be part of automated image processing pipelines (e.g. isolating stain components, boosting learning models), and assist in digital pathology applications (e.g. enhancing biomarker visibility, enabling colorblind-friendly renditions). Employing only basic arithmetic and matrix operations, this work offers a computationally accessible methodology - in the hypercomplex domain - that showcases versatility and consistency across image processing tasks and a range of computer vision and biomedical applications. The proposed non-data-driven methods achieve comparable or better results (particularly in cases involving well-known methods) to those reported in the literature, showcasing the potential of robust theoretical frameworks with practical effectiveness. Results, methods, and limitations are detailed alongside discussion of promising extensions, emphasizing the potential of feature-rich mathematical/computational frameworks for natural and biomedical images.
超复数图像处理扩展了传统技术,通过统一的范式涵盖了代数和几何原理。这项工作利用四元数以及二维正交平面分割框架(即将代表像素的四元数拆分为两对正交2D平面),用于自然/生物医学图像分析,并通过以下计算流程及成果来实现: - 自然/生物医学图像重新着色 - 自然图像去色彩化 - 自然/生物医学图像对比度增强 - 组织学图像的计算机重染和染料分离 - 用于组织学图像的机器学习/深度学习管道性能提升 这些工作流程分别对自然图像和生物医学图像进行分析,以展示所提出方法的有效性。提出的流程可以调节颜色外观(例如通过替代渲染和灰度转换),增强图像对比度,并可成为自动图像处理流水线的一部分(例如隔离染料成分、增强学习模型)。此外,它们还可以帮助数字病理学应用(例如增强生物标志物的可见性,实现对色盲友好的呈现)。 仅使用基本算术运算和矩阵操作,这项工作提供了一种在超复数域内易于计算的方法,并展示了其在图像处理任务中以及计算机视觉和生物医学应用范围内的灵活性和一致性。提出的非数据驱动方法实现了与文献报告相媲美或更好的结果(尤其是在涉及知名方法的情况下),突显了基于强大理论框架的实际效果的潜力。 详细介绍了成果、方法和限制,同时讨论了有前景的扩展方向,并强调了丰富特征数学/计算框架在自然图像和生物医学图像中的潜力。
https://arxiv.org/abs/2502.07758
This paper presents a novel Natural Language Processing (NLP) framework for enhancing medical diagnosis through the integration of advanced techniques in data augmentation, feature extraction, and classification. The proposed approach employs back-translation to generate diverse paraphrased datasets, improving robustness and mitigating overfitting in classification tasks. Leveraging Decoding-enhanced BERT with Disentangled Attention (DeBERTa) with Dynamic Contextual Positional Gating (DCPG), the model captures fine-grained contextual and positional relationships, dynamically adjusting the influence of positional information based on semantic context to produce high-quality text embeddings. For classification, an Attention-Based Feedforward Neural Network (ABFNN) is utilized, effectively focusing on the most relevant features to improve decision-making accuracy. Applied to the classification of symptoms, clinical notes, and other medical texts, this architecture demonstrates its ability to address the complexities of medical data. The combination of data augmentation, contextual embedding generation, and advanced classification mechanisms offers a robust and accurate diagnostic tool, with potential applications in automated medical diagnosis and clinical decision support. This method demonstrates the effectiveness of the proposed NLP framework for medical diagnosis, achieving remarkable results with an accuracy of 99.78%, recall of 99.72%, precision of 99.79%, and an F1-score of 99.75%. These metrics not only underscore the model's robust performance in classifying medical texts with exceptional precision and reliability but also highlight its superiority over existing methods, making it a highly promising tool for automated diagnostic systems.
本文提出了一种新颖的自然语言处理(NLP)框架,通过集成先进的数据增强、特征提取和分类技术来提升医学诊断。该方法采用回译生成多样化的同义句数据集,以提高鲁棒性并减轻分类任务中的过拟合问题。 利用解码增强BERT与分散注意力机制(DeBERTa)结合动态上下文位置门控(DCPG),模型能够捕捉到细微的语境和位置关系,并根据语义背景动态调整位置信息的影响,生成高质量的文字嵌入。在分类阶段,采用基于注意机制的前馈神经网络(ABFNN),有效地聚焦于最相关的特征以提高决策准确性。 将该架构应用于症状、临床笔记和其他医学文本的分类,证明了其解决医学数据复杂性的能力。通过结合数据增强、上下文生成和先进的分类方法,为医学诊断提供了强大而准确的工具,并具有在自动医疗诊断和支持临床决策方面的潜在应用价值。 本文提出的NLP框架对于医学诊断的有效性得到了证实,在精确度(99.78%)、召回率(99.72%)、精确性(99.79%)以及F1值(99.75%)方面取得了卓越的结果。这些指标不仅凸显了模型在分类医学文本时的稳健性能和高精度,还强调了其相对于现有方法的优势,使其成为自动诊断系统中的一个极具前景的工具。
https://arxiv.org/abs/2502.07755
Zero-Shot Anomaly Detection (ZSAD) is an emerging AD paradigm. Unlike the traditional unsupervised AD setting that requires a large number of normal samples to train a model, ZSAD is more practical for handling data-restricted real-world scenarios. Recently, Multimodal Large Language Models (MLLMs) have shown revolutionary reasoning capabilities in various vision tasks. However, the reasoning of image abnormalities remains underexplored due to the lack of corresponding datasets and benchmarks. To facilitate research in AD & reasoning, we establish the first visual instruction tuning dataset, Anomaly-Instruct-125k, and the evaluation benchmark, VisA-D&R. Through investigation with our benchmark, we reveal that current MLLMs like GPT-4o cannot accurately detect and describe fine-grained anomalous details in images. To address this, we propose Anomaly-OneVision (Anomaly-OV), the first specialist visual assistant for ZSAD and reasoning. Inspired by human behavior in visual inspection, Anomaly-OV leverages a Look-Twice Feature Matching (LTFM) mechanism to adaptively select and emphasize abnormal visual tokens. Extensive experiments demonstrate that Anomaly-OV achieves significant improvements over advanced generalist models in both detection and reasoning. Extensions to medical and 3D AD are provided for future study. The link to our project page: this https URL
零样本异常检测(ZSAD)是一种新兴的异常检测范式。与传统的无监督异常检测设置需要大量正常样本训练模型不同,ZSAD更适用于处理数据受限的真实场景。近期,多模态大型语言模型(MLLMs)在各种视觉任务中展示了革命性的推理能力。然而,由于缺乏相应的数据集和基准,图像异常的推理研究仍然较少。为了促进异常检测与推理的研究,我们建立了首个视觉指令调优数据集Anomaly-Instruct-125k以及评估基准VisA-D&R。通过我们的基准进行调查后发现,现有的多模态大语言模型如GPT-4o无法准确地检测和描述图像中的细微异常细节。 为了解决这一问题,我们提出了首个针对ZSAD和推理的专家视觉助手Anomaly-OneVision(Anomaly-OV)。受人类在视觉检查中行为启发,Anomaly-OV采用了一种Look-Twice特征匹配机制来自适应地选择并强调异常视觉标记。广泛实验表明,在检测与推理两个方面,Anomaly-OV均实现了比先进通用模型显著的性能提升。此外还提供了将该技术应用于医疗和3D领域的扩展研究方向。 项目页面链接:[请在此处插入实际的URL链接]
https://arxiv.org/abs/2502.07601
Generative models, particularly text-to-image (T2I) diffusion models, play a crucial role in medical image analysis. However, these models are prone to training data memorization, posing significant risks to patient privacy. Synthetic chest X-ray generation is one of the most common applications in medical image analysis with the MIMIC-CXR dataset serving as the primary data repository for this task. This study adopts a data-driven approach and presents the first systematic attempt to identify prompts and text tokens in MIMIC-CXR that contribute the most to training data memorization. Our analysis reveals an unexpected finding: prompts containing traces of de-identification procedures are among the most memorized, with de-identification markers contributing the most. Furthermore, we also find existing inference-time memorization mitigation strategies are ineffective and fail to sufficiently reduce the model's reliance on memorized text tokens highlighting a broader issue in T2I synthesis with MIMIC-CXR. On this front, we propose actionable strategies to enhance privacy and improve the reliability of generative models in medical imaging. Finally, our results provide a foundation for future work on developing and benchmarking memorization mitigation techniques for synthetic chest X-ray generation using the MIMIC-CXR dataset.
生成模型,特别是文本到图像(T2I)扩散模型,在医学影像分析中扮演着重要角色。然而,这些模型容易记忆训练数据,从而对患者隐私构成重大风险。合成胸部X光片的生成是医学影像分析中最常见的应用之一,MIMIC-CXR 数据集则是该任务的主要数据存储库。本研究采用数据驱动的方法,并首次系统地尝试识别在 MIMIC-CXR 中最有助于训练数据记忆的提示和文本标记。我们的分析揭示了一个出乎意料的发现:包含去标识化程序痕迹的提示是被记忆最多的,而去标识化的标记贡献最大。此外,我们还发现现有的推理时的记忆缓解策略无效,并且未能充分减少模型对已记住文本标记的依赖性,这在使用 MIMIC-CXR 的 T2I 合成中揭示了一个更广泛的问题。在此方面,我们提出了可操作的策略以增强隐私保护并提高生成模型在医学影像中的可靠性。最后,我们的结果为未来关于开发和基准测试用于合成胸部 X 光片生成的记忆缓解技术的研究奠定了基础。
https://arxiv.org/abs/2502.07516
In semi-supervised medical image segmentation, the poor quality of unlabeled data and the uncertainty in the model's predictions lead to models that inevitably produce erroneous pseudo-labels. These errors accumulate throughout model training, thereby weakening the model's performance. We found that these erroneous pseudo-labels are typically concentrated in high-uncertainty regions. Traditional methods improve performance by directly discarding pseudo-labels in these regions, but this can also result in neglecting potentially valuable training data. To alleviate this problem, we propose a bidirectional uncertainty-aware region learning strategy. In training labeled data, we focus on high-uncertainty regions, using precise label information to guide the model's learning in potentially uncontrollable areas. Meanwhile, in the training of unlabeled data, we concentrate on low-uncertainty regions to reduce the interference of erroneous pseudo-labels on the model. Through this bidirectional learning strategy, the model's overall performance has significantly improved. Extensive experiments show that our proposed method achieves significant performance improvement on different medical image segmentation tasks.
在半监督医学图像分割中,未标注数据的质量低和模型预测的不确定性会导致模型不可避免地生成错误的伪标签。这些错误在整个训练过程中累积起来,从而削弱了模型的表现。我们发现,这些错误的伪标签通常集中在高不确定性的区域。传统的方法通过直接丢弃这些区域中的伪标签来提高性能,但这也会导致忽略潜在有价值的训练数据。为了解决这个问题,我们提出了一种双向不确定性感知区域学习策略。在训练标注数据时,我们将重点放在高不确定性区域上,利用精确的标签信息指导模型在可能无法控制的区域的学习过程。同时,在未标注数据的训练中,我们将注意力集中在低不确定性的区域以减少错误伪标签对模型的影响。通过这种双向学习策略,模型的整体性能得到了显著提升。广泛的实验表明,我们提出的方法在不同的医学图像分割任务上取得了显著的性能改进。
https://arxiv.org/abs/2502.07457
Whole slide pathology image classification presents challenges due to gigapixel image sizes and limited annotation labels, hindering model generalization. This paper introduces a prompt learning method to adapt large vision-language models for few-shot pathology classification. We first extend the Prov-GigaPath vision foundation model, pre-trained on 1.3 billion pathology image tiles, into a vision-language model by adding adaptors and aligning it with medical text encoders via contrastive learning on 923K image-text pairs. The model is then used to extract visual features and text embeddings from few-shot annotations and fine-tunes with learnable prompt embeddings. Unlike prior methods that combine prompts with frozen features using prefix embeddings or self-attention, we propose multi-granular attention that compares interactions between learnable prompts with individual image patches and groups of them. This approach improves the model's ability to capture both fine-grained details and broader context, enhancing its recognition of complex patterns across sub-regions. To further improve accuracy, we leverage (unbalanced) optimal transport-based visual-text distance to secure model robustness by mitigating perturbations that might occur during the data augmentation process. Empirical experiments on lung, kidney, and breast pathology modalities validate the effectiveness of our approach; thereby, we surpass several of the latest competitors and consistently improve performance across diverse architectures, including CLIP, PLIP, and Prov-GigaPath integrated PLIP. We release our implementations and pre-trained models at this MGPATH.
全滑病理图像分类由于其巨大的像素尺寸和有限的标注标签而面临挑战,这阻碍了模型泛化能力的发展。本文介绍了一种提示学习方法,用于将大型视觉-语言模型适应为少量样本病理分类的工具。我们首先通过在923K张图像文本对上进行对比学习,并添加适配器来扩展Prov-GigaPath(一个基于13亿病理图像块预训练的视觉基础模型),使其成为一个与医学文本编码器对齐的视觉-语言模型。然后,该模型用于从少量样本标注中提取视觉特征和文本嵌入,并通过可学习提示嵌入进行微调。 不同于之前结合冻结特征使用前缀嵌入或自我注意的方法,我们提出了一种多粒度注意力机制,这种方法比较了可学习提示与单个图像补丁及其组之间的交互作用。这种策略提高了模型捕捉细微细节和广泛背景信息的能力,从而增强了其识别不同亚区域复杂模式的能力。 为了进一步提高准确性,我们利用基于(不平衡)最优传输的视觉-文本距离来确保模型在数据增强过程中免受可能发生的扰动影响,从而使模型更具鲁棒性。 通过肺部、肾脏和乳腺病理学模态的实验证明了我们方法的有效性。因此,在包括CLIP、PLIP以及集成PLIP的Prov-GigaPath等不同架构上,我们的方法超越了许多最新的竞争对手,并且在各种架构中持续提高了性能表现。我们将实现代码与预训练模型发布于MGPATH。 该段落描述了一种用于改进病理图像分类的新方法,强调了如何通过结合视觉-语言模型、可学习提示以及多粒度注意力机制来提升模型的泛化能力及鲁棒性,并且展示了其在多种病理学模态中的有效性。
https://arxiv.org/abs/2502.07409
Manual segmentation is labor-intensive, and automatic segmentation remains challenging due to the inherent variability in meniscal morphology, partial volume effects, and low contrast between the meniscus and surrounding tissues. To address these challenges, we propose ERANet, an innovative semi-supervised framework for meniscus segmentation that effectively leverages both labeled and unlabeled images through advanced augmentation and learning strategies. ERANet integrates three key components: edge replacement augmentation (ERA), prototype consistency alignment (PCA), and a conditional self-training (CST) strategy within a mean teacher architecture. ERA introduces anatomically relevant perturbations by simulating meniscal variations, ensuring that augmentations align with the structural context. PCA enhances segmentation performance by aligning intra-class features and promoting compact, discriminative feature representations, particularly in scenarios with limited labeled data. CST improves segmentation robustness by iteratively refining pseudo-labels and mitigating the impact of label noise during training. Together, these innovations establish ERANet as a robust and scalable solution for meniscus segmentation, effectively addressing key barriers to practical implementation. We validated ERANet comprehensively on 3D Double Echo Steady State (DESS) and 3D Fast/Turbo Spin Echo (FSE/TSE) MRI sequences. The results demonstrate the superior performance of ERANet compared to state-of-the-art methods. The proposed framework achieves reliable and accurate segmentation of meniscus structures, even when trained on minimal labeled data. Extensive ablation studies further highlight the synergistic contributions of ERA, PCA, and CST, solidifying ERANet as a transformative solution for semi-supervised meniscus segmentation in medical imaging.
手动分割半月板是一项耗时的工作,而自动分割仍然具有挑战性,原因在于半月板形态的固有变化、部分体积效应以及半月板与其周围组织之间的低对比度。为解决这些问题,我们提出了ERANet——一种创新性的半监督框架,用于半月板分割,该框架能有效利用标记和未标记图像通过高级增强和学习策略。ERANet整合了三个关键组件:边缘替换增强(ERA)、原型一致性对齐(PCA)以及条件自我训练(CST)策略,在均值教师架构内运行。 - **边缘替换增强(ERA)**:通过模拟半月板变化引入解剖相关的扰动,确保增强操作与结构背景一致。 - **原型一致性对齐(PCA)**:通过对类内部特征进行校准来提升分割性能,并促进紧凑且具有判别性的特征表示,在标记数据有限的情况下尤为有效。 - **条件自我训练(CST)策略**:通过迭代优化伪标签并减少训练期间的标签噪声影响,提高分割的稳健性。 这些创新共同构建了ERANet作为半月板分割的一种坚固而可扩展解决方案的基础,有效地解决了其实用实施中的关键障碍。我们对3D双回波稳态(DESS)和3D快速/涡旋自旋回波(FSE/TSE)MRI序列进行了全面验证,结果显示ERANet的性能优于当前最先进的方法。该框架即使在标记数据有限的情况下也能实现可靠的半月板结构分割。 详细的消融研究进一步强调了ERA、PCA 和CST之间的协同作用,确立了ERANet作为半监督半月板分割领域变革性解决方案的地位,在医学影像学中具有重要意义。
https://arxiv.org/abs/2502.07331
Multi-class cell segmentation in high-resolution gigapixel whole slide images (WSI) is crucial for various clinical applications. However, training such models typically requires labor-intensive, pixel-wise annotations by domain experts. Recent efforts have democratized this process by involving lay annotators without medical expertise. However, conventional non-agent-based approaches struggle to handle annotation noise adaptively, as they lack mechanisms to mitigate false positives (FP) and false negatives (FN) at both the image-feature and pixel levels. In this paper, we propose a consensus-aware self-corrective AI agent that leverages the Consensus Matrix to guide its learning process. The Consensus Matrix defines regions where both the AI and annotators agree on cell and non-cell annotations, which are prioritized with stronger supervision. Conversely, areas of disagreement are adaptively weighted based on their feature similarity to high-confidence agreement regions, with more similar regions receiving greater attention. Additionally, contrastive learning is employed to separate features of noisy regions from those of reliable agreement regions by maximizing their dissimilarity. This paradigm enables the AI to iteratively refine noisy labels, enhancing its robustness. Validated on one real-world lay-annotated cell dataset and two simulated noisy datasets, our method demonstrates improved segmentation performance, effectively correcting FP and FN errors and showcasing its potential for training robust models on noisy datasets. The official implementation and cell annotations are publicly available at this https URL.
在高分辨率的数十亿像素级全玻片图像(WSI)中进行多类细胞分割对于各种临床应用至关重要。然而,训练此类模型通常需要由领域专家提供的劳动密集型、逐像素注释数据。最近的研究通过让非专业人士参与标注过程来民主化这一流程。但是,传统的方法在没有代理机制的情况下难以适应性地处理标注噪声,因为它们缺乏应对假阳性(FP)和假阴性(FN)的机制。 为此,在本文中我们提出了一种基于共识矩阵引导学习过程的自我纠正AI代理方法。该共识矩阵定义了AI与注释者就细胞和非细胞注释达成一致的区域,并且这些区域会获得更强监督,优先处理。相反地,对于存在争议的区域,则根据它们在特征相似度上接近高置信度一致性区域的程度来适应性加权,越相似的区域将受到更多关注。 此外,我们还采用了对比学习的方法,通过最大化噪声区域与可靠一致区域之间特性的差异,使模型能够区分并分离这些特性。这种范式使得AI能够迭代地精炼有噪音的标签,从而增强其鲁棒性。 在一项现实世界的非专业标注细胞数据集和两个模拟的带有噪声的数据集上验证了我们的方法,结果显示改进后的分割性能表现良好,有效纠正了FP和FN错误,并展示了训练具有高鲁棒性的模型处理嘈杂数据的能力。官方实现代码与细胞注释可公开访问([提供链接])。
https://arxiv.org/abs/2502.07302
Single-source domain generalization (SDG) in medical image segmentation remains a significant challenge, particularly for images with varying color distributions and qualities. Previous approaches often struggle when models trained on high-quality images fail to generalize to low-quality test images due to these color and quality shifts. In this work, we propose two novel techniques to enhance generalization: dynamic color image normalization (DCIN) module and color-quality generalization (CQG) loss. The DCIN dynamically normalizes the color of test images using two reference image selection strategies. Specifically, the DCIN utilizes a global reference image selection (GRIS), which finds a universal reference image, and a local reference image selection (LRIS), which selects a semantically similar reference image per test sample. Additionally, CQG loss enforces invariance to color and quality variations by ensuring consistent segmentation predictions across transformed image pairs. Experimental results show that our proposals significantly improve segmentation performance over the baseline on two target domain datasets, despite being trained solely on a single source domain. Notably, our model achieved up to a 32.3-point increase in Dice score compared to the baseline, consistently producing robust and usable results even under substantial domain shifts. Our work contributes to the development of more robust medical image segmentation models that generalize across unseen domains. The implementation code is available at this https URL.
单源领域泛化(SDG)在医学图像分割中仍然是一个重大挑战,尤其是在面对颜色分布和质量各不相同的图像时。以往的方法往往难以应对模型在训练过程中使用高质量图片而无法泛化到低质量测试图片的情况,这是由于这些图片的颜色和质量发生了变化所导致的。在这项工作中,我们提出两种新的技术来提高泛化能力:动态色彩图像归一化(DCIN)模块和颜色-质量泛化(CQG)损失函数。 1. **动态色彩图像归一化(DCIN)**:该模块通过两个参考图片选择策略动态地调整测试图像的颜色。具体来说,DCIN使用全局参考图选择(GRIS),寻找一个通用的参考图片,并且利用局部参考图选择(LRIS),为每个测试样本选择语义相似度高的参考图片。 2. **颜色-质量泛化损失(CQG)**:该损失函数通过确保在经过变换后的图像对之间保持一致性的分割预测,强制模型对抗颜色和质量的变化。 实验结果表明,在仅使用一个源域进行训练的情况下,我们的方法显著提高了两个目标领域数据集上的分割性能。特别地,与基线相比,我们实现了高达32.3点的Dice分数提升,并且即使在重大的领域偏移下也能持续产生稳健且可用的结果。本研究为开发能够跨未知领域泛化的更为健壮的医学图像分割模型做出了贡献。 我们的实现代码可在[此处](https://this-url.com)获取(原文中的链接请根据实际情况替换)。
https://arxiv.org/abs/2502.07200
Counterfactual explanations in medical imaging are critical for understanding the predictions made by deep learning models. We extend the Latent Shift counterfactual generation method from 2D applications to 3D computed tomography (CT) scans. We address the challenges associated with 3D data, such as limited training samples and high memory demands, by implementing a slice-based approach. This method leverages a 2D encoder trained on CT slices, which are subsequently combined to maintain 3D context. We demonstrate this technique on two models for clinical phenotype prediction and lung segmentation. Our approach is both memory-efficient and effective for generating interpretable counterfactuals in high-resolution 3D medical imaging.
在医学成像中,反事实解释对于理解深度学习模型所做出的预测至关重要。我们将二维应用中的潜在偏移(Latent Shift)反事实生成方法扩展到三维计算机断层扫描(CT)图像。为了解决与三维数据相关的挑战,如训练样本有限和内存需求高,我们实施了一种基于切片的方法。这种方法利用了一个在CT切片上训练的2D编码器,并随后将这些切片组合以保持3D上下文。我们在两种临床表型预测和肺部分割模型中展示了这项技术的应用。我们的方法既节省内存又有效,能够在高分辨率的三维医学影像中生成可解释的反事实情况。
https://arxiv.org/abs/2502.07156
Anatomy evaluation is crucial for understanding the physiological state, diagnosing abnormalities, and guiding medical interventions. Statistical shape modeling (SSM) is vital in this process. By enabling the extraction of quantitative morphological shape descriptors from MRI and CT scans, SSM provides comprehensive descriptions of anatomical variations within a population. However, the effectiveness of SSM in anatomy evaluation hinges on the quality and robustness of the shape models. While deep learning techniques show promise in addressing these challenges by learning complex nonlinear representations of shapes, existing models still have limitations and often require pre-established shape models for training. To overcome these issues, we propose Mesh2SSM++, a novel approach that learns to estimate correspondences from meshes in an unsupervised manner. This method leverages unsupervised, permutation-invariant representation learning to estimate how to deform a template point cloud into subject-specific meshes, forming a correspondence-based shape model. Additionally, our probabilistic formulation allows learning a population-specific template, reducing potential biases associated with template selection. A key feature of Mesh2SSM++ is its ability to quantify aleatoric uncertainty, which captures inherent data variability and is essential for ensuring reliable model predictions and robust decision-making in clinical tasks, especially under challenging imaging conditions. Through extensive validation across diverse anatomies, evaluation metrics, and downstream tasks, we demonstrate that Mesh2SSM++ outperforms existing methods. Its ability to operate directly on meshes, combined with computational efficiency and interpretability through its probabilistic framework, makes it an attractive alternative to traditional and deep learning-based SSM approaches.
解剖评估对于理解生理状态、诊断异常以及指导医疗干预至关重要。统计形状建模(SSM)在此过程中起着关键作用,它能够从MRI和CT扫描中提取定量形态学描述符,从而提供人群内部解剖变异的全面描述。然而,SSM在解剖评估中的有效性取决于形状模型的质量和稳健性。虽然深度学习技术显示出通过学习复杂的非线性形状表示来解决这些问题的潜力,但现有的模型仍存在局限性,并且通常需要预先建立的形状模型进行训练。 为了克服这些挑战,我们提出了Mesh2SSM++,这是一种全新的方法,能够以无监督的方式从网格中估计对应关系。这种方法利用了无监督、排列不变的表示学习技术,估计如何将模板点云变形为特定主题的网格,形成基于对应关系的形状模型。此外,我们的概率公式允许学习特定人群的模板,从而减少与模板选择相关的潜在偏差。 Mesh2SSM++的一个关键特征是它能够量化固有数据变异性(即随机不确定性),这对于确保可靠模型预测和在临床任务中进行稳健决策,特别是在具有挑战性的成像条件下尤其重要。通过在不同解剖结构、评估指标以及下游任务上的广泛验证,我们证明了Mesh2SSM++优于现有方法。其直接在网格上操作的能力结合计算效率和概率框架带来的可解释性,使其成为传统及基于深度学习的SSM方法的一个有吸引力的选择。 总之,Mesh2SSM++通过提供一种更有效、高效且具有可解释性的解决方案,极大地推动了统计形状建模领域的发展。
https://arxiv.org/abs/2502.07145
Accurate and efficient diagnosis in online medical consultations remains a challenge for current large language models. These models often rely on single-turn interactions and lack the ability to refine their predictions through follow-up questions. Additionally, their responses frequently contain complex medical terminology, making them less accessible to non-medical users and creating barriers to effective communication. In this paper, we introduce Ask Patients with Patience (APP), the first multi-turn dialogue that enables LLMs to iteratively refine diagnoses based on grounded reasoning. By integrating medical guidelines and entropy minimization, APP improves both diagnostic accuracy and efficiency. Furthermore, it features human-centric communication that bridges the gap between user comprehension and medical terminology, significantly enhancing user accessibility and engagement. We evaluated APP using a subset of the ReMeDi dataset, comparing it with single-turn and traditional multi-turn LLM baselines. APP achieved higher similarity scores in diagnosis predictions, demonstrating better alignment with ground truth diagnoses. Entropy analysis showed that APP reduces diagnostic uncertainty more rapidly across iterations, increasing confidence in its predictions. APP also excels in user accessibility and empathy, further bridging the gap between complex medical language and user understanding. Code will be released at: this https URL.
在线医疗咨询中,准确和高效的诊断对当前的大规模语言模型(LLM)来说仍是一个挑战。这些模型通常依赖于单轮互动,并且缺乏通过后续提问来改进预测的能力。此外,它们的响应常常包含复杂的医学术语,这使得非专业用户难以理解,并阻碍了有效的沟通。在本文中,我们介绍了“Ask Patients with Patience”(APP),这是第一个多轮对话系统,它使LLM能够基于基于证据的推理迭代地细化诊断。通过整合医疗指南和熵最小化技术,APP提高了诊断的准确性和效率。此外,该系统还具备以用户为中心的沟通功能,弥合了用户理解能力和医学术语之间的差距,显著提升了用户的可访问性和参与度。 我们使用ReMeDi数据集的一个子集对APP进行了评估,并将其与单轮和传统多轮LLM基准模型进行了比较。结果显示,APP在诊断预测中的相似性得分更高,表明其更接近于真实诊断结果。熵分析显示,APP在迭代过程中更快地减少了诊断不确定性,提高了其预测的可信度。此外,APP还在用户可访问性和共情方面表现出色,进一步缩小了复杂医学语言和用户理解之间的差距。 相关代码将在以下链接发布:[this URL](https://example.com)。
https://arxiv.org/abs/2502.07143