The recent development of deep learning large models in medicine shows remarkable performance in medical image analysis and diagnosis, but their large number of parameters causes memory and inference latency challenges. Knowledge distillation offers a solution, but the slide-level gradients cannot be backpropagated for student model updates due to high-resolution pathological images and slide-level labels. This study presents an Efficient Fine-tuning on Compressed Models (EFCM) framework with two stages: unsupervised feature distillation and fine-tuning. In the distillation stage, Feature Projection Distillation (FPD) is proposed with a TransScan module for adaptive receptive field adjustment to enhance the knowledge absorption capability of the student model. In the slide-level fine-tuning stage, three strategies (Reuse CLAM, Retrain CLAM, and End2end Train CLAM (ETC)) are compared. Experiments are conducted on 11 downstream datasets related to three large medical models: RETFound for retina, MRM for chest X-ray, and BROW for histopathology. The experimental results demonstrate that the EFCM framework significantly improves accuracy and efficiency in handling slide-level pathological image problems, effectively addressing the challenges of deploying large medical models. Specifically, it achieves a 4.33% increase in ACC and a 5.2% increase in AUC compared to the large model BROW on the TCGA-NSCLC and TCGA-BRCA datasets. The analysis of model inference efficiency highlights the high efficiency of the distillation fine-tuning method.
近年来,在医学领域中,深度学习大型模型的开发在医学图像分析和诊断方面表现出显著的性能,但它们具有大量的参数,导致记忆和推理延迟。知识蒸馏提供了解决方案,但由于高分辨率病理图像和层级的标签,学生模型的更新无法通过级联梯度进行反向传播。这项研究介绍了一种高效的可压缩模型(EFCM)框架,包括两个阶段:无监督特征蒸馏和微调。在蒸馏阶段,提出了使用TransScan模块的Feature Projection Distillation(FPD)策略,以自适应地调整学生模型的感官场以增强知识吸收能力。在微调阶段,比较了三种策略(重用CLAM,重置CLAM和端到端训练CLAM(ETC))。实验在三个大型医学模型相关的11个下游数据集上进行,包括视网膜、胸部X光片和病理学。实验结果表明,EFCM框架在处理层级的病理图像问题方面显著提高了准确性和效率,有效解决了部署大型医疗模型的挑战。具体来说,它比大型模型BROW在TCGA-NSCLC和TCGA-BRCA数据集上实现了4.33%的ACC和5.2%的AUC的提高。对模型推理效率的分析强调了蒸馏微调方法的效率。
https://arxiv.org/abs/2409.11817
The recent success of large language models (LLMs) and the scaling law has led to a widespread adoption of larger models. Particularly in the healthcare industry, there is an increasing demand for locally operated LLMs due to security concerns. However, the majority of high quality open-source LLMs have a size of 70B parameters, imposing significant financial burdens on users for GPU preparation and operation. To overcome these issues, we present a medical adaptation based on the recent 7B models, which enables the operation in low computational resources. We compare the performance on medical question-answering benchmarks in two languages (Japanese and English), demonstrating that its scores reach parity with or surpass those of currently existing medical LLMs that are ten times larger. We find that fine-tuning an English-centric base model on Japanese medical dataset improves the score in both language, supporting the effect of cross-lingual knowledge transfer. We hope that this study will alleviate financial challenges, serving as a stepping stone for clinical institutions to practically utilize LLMs locally. Our evaluation code is available at this https URL.
近年来大型语言模型(LLMs)的成功和扩展定律的普及,导致了许多大型模型的广泛采用。特别是在医疗行业,由于安全问题,对本地运营的LLM的需求不断增加。然而,大多数高质量的开源LLM的参数规模为70B,对用户来说,GPU的准备和运行成本带来了巨大的财务负担。为了克服这些问题,我们基于最近的7B模型提出了一个医疗适应性,使得在低计算资源下进行操作。我们比较了在两种语言(日本语和英语)上的医疗问题回答基准测试的成绩,证明了其分数与现有的医疗LLM相当或者超过了它们。我们发现,在英语-中心模型的基础上对日本医疗数据集进行微调,在两种语言上都有提高,这支持跨语言知识转移的效果。我们希望这项研究能够减轻财务负担,作为临床机构实际利用LLM的一步阶梯。我们的评估代码可以从该链接的https:// URL中获取。
https://arxiv.org/abs/2409.11783
Abdominal computed tomography (CT) scans are frequently performed in clinical settings. Opportunistic CT involves repurposing routine CT images to extract diagnostic information and is an emerging tool for detecting underdiagnosed conditions such as sarcopenia, hepatic steatosis, and ascites. This study utilizes deep learning methods to promote accurate diagnosis and clinical documentation. We analyze 2,674 inpatient CT scans to identify discrepancies between imaging phenotypes (characteristics derived from opportunistic CT scans) and their corresponding documentation in radiology reports and ICD coding. Through our analysis, we find that only 0.5%, 3.2%, and 30.7% of scans diagnosed with sarcopenia, hepatic steatosis, and ascites (respectively) through either opportunistic imaging or radiology reports were ICD-coded. Our findings demonstrate opportunistic CT's potential to enhance diagnostic precision and accuracy of risk adjustment models, offering advancements in precision medicine.
腹部计算机断层扫描(CT)在临床环境中经常进行。机会性CT涉及将常规CT图像重新利用来提取诊断信息,是一种新兴的用于检测未得到充分诊断的疾病,如骨质疏松症、肝脏脂肪变性和肝性脑病(AI)的工具。本研究利用深度学习方法来促进准确的诊断和临床记录。我们对2674名患者的CT扫描进行分析,以识别影像表型(由机会性CT扫描得出特征)与相应病历报告中的影像学表现之间的差异。通过我们的分析,我们发现只有0.5%、3.2%和30.7%的扫描被诊断为骨质疏松症、肝脏脂肪变性和肝性脑病(分别)时,才会被ICD编码。我们的研究结果表明,机会性CT可以提高风险调整模型的诊断精度和准确性,为精准医学带来进步。
https://arxiv.org/abs/2409.11686
Histopathology analysis is the gold standard for medical diagnosis. Accurate classification of whole slide images (WSIs) and region-of-interests (ROIs) localization can assist pathologists in diagnosis. The gigapixel resolution of WSI and the absence of fine-grained annotations make direct classification and analysis challenging. In weakly supervised learning, multiple instance learning (MIL) presents a promising approach for WSI classification. The prevailing strategy is to use attention mechanisms to measure instance importance for classification. However, attention mechanisms fail to capture inter-instance information, and self-attention causes quadratic computational complexity. To address these challenges, we propose AMD-MIL, an agent aggregator with a mask denoise mechanism. The agent token acts as an intermediate variable between the query and key for computing instance importance. Mask and denoising matrices, mapped from agents-aggregated value, dynamically mask low-contribution representations and eliminate noise. AMD-MIL achieves better attention allocation by adjusting feature representations, capturing micro-metastases in cancer, and improving interpretability. Extensive experiments on CAMELYON-16, CAMELYON-17, TCGA-KIDNEY, and TCGA-LUNG show AMD-MIL's superiority over state-of-the-art methods.
病理学分析是医疗诊断的黄金标准。准确地对整个切片图像(WSIs)和感兴趣区域(ROIs)进行分类和定位可以帮助病理学家进行诊断。WSI的巨像素分辨率以及缺乏细粒度注释使得直接分类和分析具有挑战性。在弱监督学习中,多实例学习(MIL)对WSI分类是一个有前途的方法。然而,目前的策略使用关注机制来衡量实例的重要性进行分类。然而,关注机制无法捕捉实例之间的交互信息,自注意力会导致平方计算复杂度。为了应对这些挑战,我们提出了AMD-MIL,一个带口罩去噪机制的代理聚合器。代理令牌充当查询和键的中间变量,用于计算实例重要性。口罩和去噪矩阵从代理聚合值进行映射,动态地遮盖低贡献表示并消除噪声。通过调整特征表示,AMD-MIL实现了更好的关注分配,捕获了癌症中的微转移,并提高了可解释性。在CAMELYON-16、CAMELYON-17、TCGA-KIDNEY和TCGA-LUNG等大量实验中,AMD-MIL优越于最先进的方法。
https://arxiv.org/abs/2409.11664
The cell is arguably the smallest unit of life and is central to understanding biology. Accurate modeling of cells is important for this understanding as well as for determining the root causes of disease. Recent advances in artificial intelligence (AI), combined with the ability to generate large-scale experimental data, present novel opportunities to model cells. Here we propose a vision of AI-powered Virtual Cells, where robust representations of cells and cellular systems under different conditions are directly learned from growing biological data across measurements and scales. We discuss desired capabilities of AI Virtual Cells, including generating universal representations of biological entities across scales, and facilitating interpretable in silico experiments to predict and understand their behavior using Virtual Instruments. We further address the challenges, opportunities and requirements to realize this vision including data needs, evaluation strategies, and community standards and engagement to ensure biological accuracy and broad utility. We envision a future where AI Virtual Cells help identify new drug targets, predict cellular responses to perturbations, as well as scale hypothesis exploration. With open science collaborations across the biomedical ecosystem that includes academia, philanthropy, and the biopharma and AI industries, a comprehensive predictive understanding of cell mechanisms and interactions is within reach.
细胞可以说是生命中最小的单位,也是理解生物学至关重要的一部分。准确地建模细胞对于这种理解以及确定疾病根本原因至关重要。近年来人工智能(AI)的进步,结合生成大量实验数据的能力,为建模细胞提供了新的机遇。在这里,我们提出了一个AI驱动的虚拟细胞愿景,其中细胞及其细胞系统在不同条件下 robust representations are directly learned from growing biological data across measurements and scales。我们讨论了AI虚拟细胞所需的功能,包括生成跨尺度的生物实体普遍表示,以及通过虚拟仪器促进可解释的体内实验以预测和理解其行为。我们还进一步讨论了实现这一愿景所需的挑战、机遇和标准,包括数据需求、评估策略和社区标准及参与,以确保生物准确性和广泛的适用性。我们想象了一个未来,AI虚拟细胞有助于发现新的药物靶点,预测细胞对扰动的响应,以及探索规模假设。在生物医学生态系统开放科学合作的基础上,包括学术界、慈善家和生物制药和AI产业,全面预测细胞机制和相互作用的实现并非遥不可及。
https://arxiv.org/abs/2409.11654
Out-of-distribution (OOD) detection is crucial for enhancing the generalization of AI models used in mammogram screening. Given the challenge of limited prior knowledge about OOD samples in external datasets, unsupervised generative learning is a preferable solution which trains the model to discern the normal characteristics of in-distribution (ID) data. The hypothesis is that during inference, the model aims to reconstruct ID samples accurately, while OOD samples exhibit poorer reconstruction due to their divergence from normality. Inspired by state-of-the-art (SOTA) hybrid architectures combining CNNs and transformers, we developed a novel backbone - HAND, for detecting OOD from large-scale digital screening mammogram studies. To boost the learning efficiency, we incorporated synthetic OOD samples and a parallel discriminator in the latent space to distinguish between ID and OOD samples. Gradient reversal to the OOD reconstruction loss penalizes the model for learning OOD reconstructions. An anomaly score is computed by weighting the reconstruction and discriminator loss. On internal RSNA mammogram held-out test and external Mayo clinic hand-curated dataset, the proposed HAND model outperformed encoder-based and GAN-based baselines, and interestingly, it also outperformed the hybrid CNN+transformer baselines. Therefore, the proposed HAND pipeline offers an automated efficient computational solution for domain-specific quality checks in external screening mammograms, yielding actionable insights without direct exposure to the private medical imaging data.
离散(OD)检测对于增强在乳腺筛查中使用的AI模型的泛化能力至关重要。由于在外部数据集中对OD样本的了解有限,无监督生成学习是一种更可取的解决方案,该解决方案训练模型以区分分布(ID)数据的正常特征。假设在推理过程中,模型旨在准确地重构ID样本,而OD样本由于其从正态性中分化而表现得更差。受到最先进的(SOTA)结合卷积神经网络(CNN)和Transformer的混合架构的启发,我们开发了一种名为HAND的新骨架,用于从大规模数字筛查乳腺X光片研究中检测OD。为了提高学习效率,我们在潜在空间中引入了合成OD样本和一个新的区分器,以区分ID和OD样本。对OD重建损失的梯度翻转惩罚模型学习OD重构。异常得分通过权衡重建和区分器损失来计算。在内部RSNA乳腺X光片持有者测试和外部梅奥诊所手动标注的数据集上,所提出的HAND模型超过了基于编码器的基线和基于GAN的基线,而且有趣的是,它还超过了基于CNN+Transformer的混合基线。因此,所提出的HAND流程为在 external screening mammograms 对领域特定质量检查提供自动高效的计算解决方案,同时不直接暴露于私有医疗成像数据。
https://arxiv.org/abs/2409.11534
Effective retinal vessel segmentation requires a sophisticated integration of global contextual awareness and local vessel continuity. To address this challenge, we propose the Graph Capsule Convolution Network (GCC-UNet), which merges capsule convolutions with CNNs to capture both local and global features. The Graph Capsule Convolution operator is specifically designed to enhance the representation of global context, while the Selective Graph Attention Fusion module ensures seamless integration of local and global information. To further improve vessel continuity, we introduce the Bottleneck Graph Attention module, which incorporates Channel-wise and Spatial Graph Attention mechanisms. The Multi-Scale Graph Fusion module adeptly combines features from various scales. Our approach has been rigorously validated through experiments on widely used public datasets, with ablation studies confirming the efficacy of each component. Comparative results highlight GCC-UNet's superior performance over existing methods, setting a new benchmark in retinal vessel segmentation. Notably, this work represents the first integration of vanilla, graph, and capsule convolutional techniques in the domain of medical image segmentation.
有效的视网膜血管分割需要全局上下文感知和局部血管连续性的精细整合。为解决这一挑战,我们提出了Graph Capsule Convolution Network(GCC-UNet),它将胶囊卷积与CNN相结合,捕捉到局部和全局特征。Graph Capsule Convolution操作特别设计用于增强全局上下文表示,而选择性Graph注意力融合模块确保了局部和全局信息的无缝融合。为了进一步提高血管连续性,我们引入了Bottleneck Graph注意力模块,该模块包含了通道级和空间级图形注意力机制。多尺度图形融合模块巧妙地将各种尺度的特征进行结合。通过在广泛使用公共数据集上进行实验,我们的方法得到了严格的验证,消融研究证实了每个组件的有效性。比较结果强调了GCC-UNet在视网膜血管分割方面的卓越性能,为视网膜血管分割设定了新的基准。值得注意的是,这项工作是第一个将徒手、图块和胶囊卷积技术在医学图像分割领域进行整合的研究。
https://arxiv.org/abs/2409.11508
In the medical domain, acquiring large datasets poses significant challenges due to privacy concerns. Nonetheless, the development of a robust deep-learning model for retinal disease diagnosis necessitates a substantial dataset for training. The capacity to generalize effectively on smaller datasets remains a persistent challenge. The scarcity of data presents a significant barrier to the practical implementation of scalable medical AI solutions. To address this issue, we've combined a wide range of data sources to improve performance and generalization to new data by giving it a deeper understanding of the data representation from multi-modal datasets and developed a self-supervised framework based on large language models (LLMs), SwinV2 to gain a deeper understanding of multi-modal dataset representations, enhancing the model's ability to extrapolate to new data for the detection of eye diseases using optical coherence tomography (OCT) images. We adopt a two-phase training methodology, self-supervised pre-training, and fine-tuning on a downstream supervised classifier. An ablation study conducted across three datasets employing various encoder backbones, without data fusion, with low data availability setting, and without self-supervised pre-training scenarios, highlights the robustness of our method. Our findings demonstrate consistent performance across these diverse conditions, showcasing superior generalization capabilities compared to the baseline model, ResNet-50.
在医疗领域,获取大量数据存在隐私方面的重大挑战。然而,为诊断眼病开发稳健的深度学习模型需要大量数据进行训练。在较小的数据集上有效扩展的能力仍然是一个持续的挑战。数据的稀缺性成为实现可扩展医疗人工智能解决方案的实际实施的一个显著障碍。为解决这个问题,我们结合了各种数据源以提高性能和泛化能力,通过从多模态数据集中的数据表示的更深的理解,为模型开发了一个基于大型语言模型(LLMs)的自我监督框架。这有助于更深入地理解多模态数据集的表示,增强模型对通过光学共轭传递断层扫描(OCT)图像检测眼病的扩展能力。我们采用了两阶段培训方法:自监督预训练和下游监督分类器的微调。在三个数据集上进行的一项消融研究,使用各种编码器后端,没有数据融合,数据可用性设置较低,没有自监督预训练场景,揭示了我们方法的稳健性。我们的研究结果表明,这些多样条件下,模型具有相似的性能,展示了与基线模型相比优越的泛化能力。
https://arxiv.org/abs/2409.11375
When employing deep neural networks (DNNs) for semantic segmentation in safety-critical applications like automotive perception or medical imaging, it is important to estimate their performance at runtime, e.g. via uncertainty estimates or prediction quality estimates. Previous works mostly performed uncertainty estimation on pixel-level. In a line of research, a connected-component-wise (segment-wise) perspective was taken, approaching uncertainty estimation on an object-level by performing so-called meta classification and regression to estimate uncertainty and prediction quality, respectively. In those works, each predicted segment is considered individually to estimate its uncertainty or prediction quality. However, the neighboring segments may provide additional hints on whether a given predicted segment is of high quality, which we study in the present work. On the basis of uncertainty indicating metrics on segment-level, we use graph neural networks (GNNs) to model the relationship of a given segment's quality as a function of the given segment's metrics as well as those of its neighboring segments. We compare different GNN architectures and achieve a notable performance improvement.
在使用深度神经网络(DNNs)进行语义分割,特别是在安全关键应用(如汽车感知或医学成像)中时,在运行时估计其性能非常重要,例如通过不确定性估计或预测质量估计。之前的工作主要在像素级别进行不确定性估计。在研究领域,我们采用了连通组件(段)视角,通过所谓的元分类和回归来估计不确定性,分别估计预测质量。在这些工作中,每个预测段都被单独考虑以估计其不确定性或预测质量。然而,相邻的段可能提供关于给定预测段是否具有高质量的其他提示,我们将在本研究中研究这一点。基于不确定性的指标,我们使用图神经网络(GNNs)来建模给定段质量与给定段指标以及其相邻段质量之间的关系。我们比较了不同的GNN架构,并实现了显著的性能改进。
https://arxiv.org/abs/2409.11373
Biomedical image segmentation is crucial for accurately diagnosing and analyzing various diseases. However, Convolutional Neural Networks (CNNs) and Transformers, the most commonly used architectures for this task, struggle to effectively capture long-range dependencies due to the inherent locality of CNNs and the computational complexity of Transformers. To address this limitation, we introduce TTT-Unet, a novel framework that integrates Test-Time Training (TTT) layers into the traditional U-Net architecture for biomedical image segmentation. TTT-Unet dynamically adjusts model parameters during the testing time, enhancing the model's ability to capture both local and long-range features. We evaluate TTT-Unet on multiple medical imaging datasets, including 3D abdominal organ segmentation in CT and MR images, instrument segmentation in endoscopy images, and cell segmentation in microscopy images. The results demonstrate that TTT-Unet consistently outperforms state-of-the-art CNN-based and Transformer-based segmentation models across all tasks. The code is available at this https URL.
生物医学图像分割对于准确诊断和分析各种疾病至关重要。然而,最常用的架构——卷积神经网络(CNNs)和Transformer,由于CNNs固有的局部性以及Transformer的计算复杂性,很难有效地捕捉长距离依赖关系。为了克服这一限制,我们引入了TTT-Unet,一种将测试时间训练(TTT)层融入传统U-Net架构的生物医学图像分割新方法。TTT-Unet在测试期间动态调整模型参数,增强了模型捕捉本地和长距离特征的能力。我们在多个医学成像数据集上评估了TTT-Unet,包括CT和MR图像中的腹部器官分割、内窥镜图像中的器械分割和显微镜图像中的细胞分割。结果表明,TTT-Unet在所有任务上都显著优于基于CNN和基于Transformer的分割模型。代码可在此处访问:https://www.xxx
https://arxiv.org/abs/2409.11299
This paper introduces Bio-Inspired Mamba (BIM), a novel online learning framework for selective state space models that integrates biological learning principles with the Mamba architecture. BIM combines Real-Time Recurrent Learning (RTRL) with Spike-Timing-Dependent Plasticity (STDP)-like local learning rules, addressing the challenges of temporal locality and biological plausibility in training spiking neural networks. Our approach leverages the inherent connection between backpropagation through time and STDP, offering a computationally efficient alternative that maintains the ability to capture long-range dependencies. We evaluate BIM on language modeling, speech recognition, and biomedical signal analysis tasks, demonstrating competitive performance against traditional methods while adhering to biological learning principles. Results show improved energy efficiency and potential for neuromorphic hardware implementation. BIM not only advances the field of biologically plausible machine learning but also provides insights into the mechanisms of temporal information processing in biological neural networks.
本文介绍了Bio-Inspired Mamba (BIM),一种针对选择性状态空间模型的在线学习框架,它将生物学习原则与Mamba架构相结合。BIM将实时递归学习(RTRL)与类似于突触可塑性(STDP)的局部学习规则相结合,解决了训练脉冲神经网络中时序相关性和生物合理性的挑战。我们的方法利用了反向传播通过时间和STDP之间的固有连接,提供了一种计算效率高且能够保持捕捉长距离依赖性的方法。我们在语言建模、语音识别和生物医学信号分析等任务上评估BIM,证明了它具有与传统方法竞争的能力,同时遵循生物学习原则。结果表明,BIM在能源效率和实现神经形态硬件方面具有改进的潜力。BIM不仅推动了生物启发式机器学习领域的发展,还揭示了生物神经网络中时间信息处理机制的见解。
https://arxiv.org/abs/2409.11263
In recent years, there has been significant development in the analysis of medical data using machine learning. It is believed that the onset of Age-related Macular Degeneration (AMD) is associated with genetic polymorphisms. However, genetic analysis is costly, and artificial intelligence may offer assistance. This paper presents a method that predict the presence of multiple susceptibility genes for AMD using fundus and Optical Coherence Tomography (OCT) images, as well as medical records. Experimental results demonstrate that integrating information from multiple modalities can effectively predict the presence of susceptibility genes with over 80$\%$ accuracy.
近年来,利用机器学习对医疗数据进行分析和研究取得了显著进展。人们相信,Age-related Macular Degeneration(AMD)的发病与基因多态性有关。然而,基因分析成本高昂,人工智能可能提供帮助。本文介绍了一种利用基金视网膜和光学相干断层扫描(OCT)图像以及病历预测AMD多个易感基因存在的方法。实验结果表明,整合多个模态的信息可以有效预测超过80%的易感基因存在。
https://arxiv.org/abs/2409.11128
Segmentation of the fetal and maternal structures, particularly intrapartum ultrasound imaging as advocated by the International Society of Ultrasound in Obstetrics and Gynecology (ISUOG) for monitoring labor progression, is a crucial first step for quantitative diagnosis and clinical decision-making. This requires specialized analysis by obstetrics professionals, in a task that i) is highly time- and cost-consuming and ii) often yields inconsistent results. The utility of automatic segmentation algorithms for biometry has been proven, though existing results remain suboptimal. To push forward advancements in this area, the Grand Challenge on Pubic Symphysis-Fetal Head Segmentation (PSFHS) was held alongside the 26th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2023). This challenge aimed to enhance the development of automatic segmentation algorithms at an international scale, providing the largest dataset to date with 5,101 intrapartum ultrasound images collected from two ultrasound machines across three hospitals from two institutions. The scientific community's enthusiastic participation led to the selection of the top 8 out of 179 entries from 193 registrants in the initial phase to proceed to the competition's second stage. These algorithms have elevated the state-of-the-art in automatic PSFHS from intrapartum ultrasound images. A thorough analysis of the results pinpointed ongoing challenges in the field and outlined recommendations for future work. The top solutions and the complete dataset remain publicly available, fostering further advancements in automatic segmentation and biometry for intrapartum ultrasound imaging.
胎儿和母体结构的分割,特别是阴道超声成像,作为国际超声学会在产科和妇科超声医学领域的建议,对于监测分娩进程具有关键作用。这需要产科专业人员的专门分析,而在这种任务上,i)耗时且成本高,ii)通常得出不一致的结果。自动分割算法在生物测量学方面的应用已经得到证实,尽管现有结果仍然存在不足。为了推动该领域的进步,在2023年的国际医学图像计算和计算机辅助干预(MICCAI)会议上,举办了“公众骶骨-胎儿头部分割大挑战”。该挑战旨在通过在国际范围内开发自动分割算法,为开发迄今最大的数据集提供支持,该数据集包括来自两家机构的三个医院的5,101个阴道超声图像。科学界的热情参与导致来自193名注册者的初步阶段前八名进入竞赛的第二阶段。这些算法使自动PSFHS的现状达到了前所未有的水平。对结果的深入分析突出了该领域当前的挑战,并指出了未来工作的建议。排名前两位的解决方案和完整的数据集仍然公开可用,为进一步推动自动分割和生物测量学在阴道超声成像方面的进步提供了支持。
https://arxiv.org/abs/2409.10980
Coronary heart disease (CHD) is a severe cardiac disease, and hence, its early diagnosis is essential as it improves treatment results and saves money on medical care. The prevailing development of quantum computing and machine learning (ML) technologies may bring practical improvement to the performance of CHD diagnosis. Quantum machine learning (QML) is receiving tremendous interest in various disciplines due to its higher performance and capabilities. A quantum leap in the healthcare industry will increase processing power and optimise multiple models. Techniques for QML have the potential to forecast cardiac disease and help in early detection. To predict the risk of coronary heart disease, a hybrid approach utilizing an ensemble machine learning model based on QML classifiers is presented in this paper. Our approach, with its unique ability to address multidimensional healthcare data, reassures the method's robustness by fusing quantum and classical ML algorithms in a multi-step inferential framework. The marked rise in heart disease and death rates impacts worldwide human health and the global economy. Reducing cardiac morbidity and mortality requires early detection of heart disease. In this research, a hybrid approach utilizes techniques with quantum computing capabilities to tackle complex problems that are not amenable to conventional machine learning algorithms and to minimize computational expenses. The proposed method has been developed in the Raspberry Pi 5 Graphics Processing Unit (GPU) platform and tested on a broad dataset that integrates clinical and imaging data from patients suffering from CHD and healthy controls. Compared to classical machine learning models, the accuracy, sensitivity, F1 score, and specificity of the proposed hybrid QML model used with CHD are manifold higher.
冠状动脉心脏病(CHD)是一种严重的循环系统疾病,因此,早期诊断对改善治疗效果和节省医疗费用至关重要。当前量子计算和机器学习(ML)技术的快速发展可能提高CHD诊断的性能。量子机器学习(QML)因其更高的性能和能力在各个领域受到越来越多的关注。在医疗行业实现量子跳跃将增加处理能力并优化多个模型。QML技术有可能预测心脏病并帮助实现早期检测。为了预测冠状动脉心脏病的风险,本文提出了一种基于QML分类器的混合方法。通过在多维 healthcare 数据中进行量子和经典 ML 算法的融合,我们确保了方法的有效性。心血管疾病发病率和死亡率的大幅上升对全球人类健康和全球经济造成了严重影响。减少心血管病发病率和死亡率需要早期诊断。在这项研究中,采用具有量子计算能力的技术来解决无法用传统机器学习算法解决的问题,以最小化计算费用。所提出的方法在 Raspberry Pi 5 Graphics Processing Unit(GPU)平台上开发,并通过整合患者CHD和健康对照者的临床和影像数据的大型数据集进行了测试。与经典机器学习模型相比,所提出的混合 QML 模型的准确率、敏感度、F1 分数和特异性均大大更高。
https://arxiv.org/abs/2409.10932
Photoacoustic computed tomography (PACT) is a non-invasive imaging modality with wide medical applications. Conventional PACT image reconstruction algorithms suffer from wavefront distortion caused by the heterogeneous speed of sound (SOS) in tissue, which leads to image degradation. Accounting for these effects improves image quality, but measuring the SOS distribution is experimentally expensive. An alternative approach is to perform joint reconstruction of the initial pressure image and SOS using only the PA signals. Existing joint reconstruction methods come with limitations: high computational cost, inability to directly recover SOS, and reliance on inaccurate simplifying assumptions. Implicit neural representation, or neural fields, is an emerging technique in computer vision to learn an efficient and continuous representation of physical fields with a coordinate-based neural network. In this work, we introduce NF-APACT, an efficient self-supervised framework utilizing neural fields to estimate the SOS in service of an accurate and robust multi-channel deconvolution. Our method removes SOS aberrations an order of magnitude faster and more accurately than existing methods. We demonstrate the success of our method on a novel numerical phantom as well as an experimentally collected phantom and in vivo data. Our code and numerical phantom are available at this https URL.
光子声学计算断层成像(PACT)是一种无创的成像方法,具有广泛的医疗应用。传统的PACT图像重建算法由于组织中声速(SOS)的异质性导致波前畸变,从而导致图像质量下降。考虑到这些影响可以提高图像质量,但测量SOS分布实验上代价高昂。一种替代方法是仅利用PA信号进行初始压力图像和SOS的联合重建。现有的联合重建方法存在局限性:高计算成本,无法直接恢复SOS,以及依赖于不准确的简化假设。 隐式神经表示或神经场是一种新兴的计算机视觉技术,通过基于坐标的神经网络学习物理场的一种高效且连续的表示。在本文中,我们引入了NF-APACT,一种利用神经场估计SOS的效率高且自监督的框架。我们的方法比现有方法更快、更准确地消除SOS畸变。我们在新颖的数值幻灯机和实验获得的幻灯以及体内数据上证明了我们的方法的优越性。我们的代码和数值幻灯可以在这个链接处获取:https://www.example.com/。
https://arxiv.org/abs/2409.10876
Accurately interpreting medical images and writing radiology reports is a critical but challenging task in healthcare. Both human-written and AI-generated reports can contain errors, ranging from clinical inaccuracies to linguistic mistakes. To address this, we introduce ReXErr, a methodology that leverages Large Language Models to generate representative errors within chest X-ray reports. Working with board-certified radiologists, we developed error categories that capture common mistakes in both human and AI-generated reports. Our approach uses a novel sampling scheme to inject diverse errors while maintaining clinical plausibility. ReXErr demonstrates consistency across error categories and produces errors that closely mimic those found in real-world scenarios. This method has the potential to aid in the development and evaluation of report correction algorithms, potentially enhancing the quality and reliability of radiology reporting.
准确地解释医学图像并撰写放射学报告是一项关键但具有挑战性的任务。人类撰写的报告和由人工智能生成的报告都可能包含错误,从临床不准确到语言错误。为了解决这个问题,我们引入了ReXErr,一种利用大型语言模型生成胸部X光报告代表性的错误的方法。与认证的放射科医生合作,我们开发了错误类别,捕捉了人类和人工智能生成的报告中常见的错误。我们的方法采用了一种新颖的抽样方案,在保持临床可信性的同时注入多样性。ReXErr在错误类别之间表现出一致性,并产生与真实世界场景中找到的错误非常相似的错误。这种方法有助于报告修正算法的开发和评估,可能提高放射学报告的质量和可靠性。
https://arxiv.org/abs/2409.10829
Multi-frequency Electrical Impedance Tomography (mfEIT) is a promising biomedical imaging technique that estimates tissue conductivities across different frequencies. Current state-of-the-art (SOTA) algorithms, which rely on supervised learning and Multiple Measurement Vectors (MMV), require extensive training data, making them time-consuming, costly, and less practical for widespread applications. Moreover, the dependency on training data in supervised MMV methods can introduce erroneous conductivity contrasts across frequencies, posing significant concerns in biomedical applications. To address these challenges, we propose a novel unsupervised learning approach based on Multi-Branch Attention Image Prior (MAIP) for mfEIT reconstruction. Our method employs a carefully designed Multi-Branch Attention Network (MBA-Net) to represent multiple frequency-dependent conductivity images and simultaneously reconstructs mfEIT images by iteratively updating its parameters. By leveraging the implicit regularization capability of the MBA-Net, our algorithm can capture significant inter- and intra-frequency correlations, enabling robust mfEIT reconstruction without the need for training data. Through simulation and real-world experiments, our approach demonstrates performance comparable to, or better than, SOTA algorithms while exhibiting superior generalization capability. These results suggest that the MAIP-based method can be used to improve the reliability and applicability of mfEIT in various settings.
多频电气阻抗成像(mfEIT)是一种有前景的生物医学成像技术,它通过不同频率估计组织导电性。目前最先进的(SOTA)算法,这些算法依赖于监督学习和多测量向量(MMV),需要大量的训练数据,导致它们耗时、昂贵且不适用于广泛的应用。此外,在监督MMV方法中,训练数据对导电性对比的依赖可能会在频率之间引入错误的导电性对比,对生物医学应用造成重大关切。为了应对这些挑战,我们提出了一个基于多分支注意力图像优先(MAIP)的新无监督学习方法用于mfEIT重构。我们的方法采用精心设计的Multi-Branch Attention网络(MBA-Net)来表示多个频率相关的导电性图像,并通过迭代更新参数同时重构mfEIT图像。通过利用MBA-Net的隐式正则化能力,我们的算法可以捕捉到显著的跨频率和跨频率的关联,实现无需训练数据的稳健mfEIT重构。通过仿真和现实世界的实验,我们的方法在性能上与或优于现有SOTA算法,同时表现出卓越的泛化能力。这些结果表明,基于MAIP的方法可以提高mfEIT在各种环境下的可靠性和适用性。
https://arxiv.org/abs/2409.10794
Predicting high-dimensional or extreme multilabels, such as in medical coding, requires both accuracy and interpretability. Existing works often rely on local interpretability methods, failing to provide comprehensive explanations of the overall mechanism behind each label prediction within a multilabel set. We propose a mechanistic interpretability module called DIctionary Label Attention (\method) that disentangles uninterpretable dense embeddings into a sparse embedding space, where each nonzero element (a dictionary feature) represents a globally learned medical concept. Through human evaluations, we show that our sparse embeddings are more human understandable than its dense counterparts by at least 50 percent. Our automated dictionary feature identification pipeline, leveraging large language models (LLMs), uncovers thousands of learned medical concepts by examining and summarizing the highest activating tokens for each dictionary feature. We represent the relationships between dictionary features and medical codes through a sparse interpretable matrix, enhancing the mechanistic and global understanding of the model's predictions while maintaining competitive performance and scalability without extensive human annotation.
预测高维或极端多标签,如医学编码,需要准确性和可解释性。现有的工作通常依赖于局部可解释方法,而无法提供每个多标签集合中每个标签预测的整体机制的全面解释。我们提出了一种机制性可解释模块,称为指示性标签关注(DIctionary Label Attention)方法,将不可解释的密集嵌入分离到稀疏嵌入空间中,其中每个非零元素(一个词典特征)表示学习到的全球医学概念。通过人类评估,我们发现我们的稀疏嵌入比其密集对应物更易于理解,至少50%的人类理解。我们的自动词典特征识别管道,利用大型语言模型(LLMs),通过检查和总结每个词典特征的最高激活词,揭示了数千个学习到的医学概念。我们通过稀疏可解释矩阵表示字典特征与医学编码之间的关系,增强了模型预测的机制性和全局理解,同时保持竞争性能和可扩展性,而无需进行大量的人类标注。
https://arxiv.org/abs/2409.10504
Semi-supervised medical image segmentation has shown promise in training models with limited labeled data and abundant unlabeled data. However, state-of-the-art methods ignore a potentially valuable source of unsupervised semantic information -- spatial registration transforms between image volumes. To address this, we propose CCT-R, a contrastive cross-teaching framework incorporating registration information. To leverage the semantic information available in registrations between volume pairs, CCT-R incorporates two proposed modules: Registration Supervision Loss (RSL) and Registration-Enhanced Positive Sampling (REPS). The RSL leverages segmentation knowledge derived from transforms between labeled and unlabeled volume pairs, providing an additional source of pseudo-labels. REPS enhances contrastive learning by identifying anatomically-corresponding positives across volumes using registration transforms. Experimental results on two challenging medical segmentation benchmarks demonstrate the effectiveness and superiority of CCT-R across various semi-supervised settings, with as few as one labeled case. Our code is available at this https URL.
半监督医学图像分割已经在训练模型时使用有限标注数据和丰富无标注数据表现出良好的效果。然而,最先进的方法忽略了潜在的有价值的无监督语义信息——空间注册在图像卷之间转换。为解决这个问题,我们提出了CCT-R,一种包含注册信息的反向传播框架。为了利用体积对之间存在的语义信息,CCT-R包括两个提议模块:注册监督损失(RSL)和注册增强阳性采样(REPS)。RSL利用从标注和未标注体积对之间的变换得到的分割知识,提供了一个额外的伪标签来源。REPS通过使用注册变换识别解剖学对应的可信积极样本,增强了对抗性学习。在两个具有挑战性的医疗分割基准上进行的实验结果表明,CCT-R在各种半监督设置中具有有效性和优越性,甚至只需要一个标注案例。我们的代码可在此处访问:https://url.
https://arxiv.org/abs/2409.10422
For more efficient generalization to unseen domains (classes), most Few-shot Segmentation (FSS) would directly exploit pre-trained encoders and only fine-tune the decoder, especially in the current era of large models. However, such fixed feature encoders tend to be class-agnostic, inevitably activating objects that are irrelevant to the target class. In contrast, humans can effortlessly focus on specific objects in the line of sight. This paper mimics the visual perception pattern of human beings and proposes a novel and powerful prompt-driven scheme, called ``Prompt and Transfer" (PAT), which constructs a dynamic class-aware prompting paradigm to tune the encoder for focusing on the interested object (target class) in the current task. Three key points are elaborated to enhance the prompting: 1) Cross-modal linguistic information is introduced to initialize prompts for each task. 2) Semantic Prompt Transfer (SPT) that precisely transfers the class-specific semantics within the images to prompts. 3) Part Mask Generator (PMG) that works in conjunction with SPT to adaptively generate different but complementary part prompts for different individuals. Surprisingly, PAT achieves competitive performance on 4 different tasks including standard FSS, Cross-domain FSS (e.g., CV, medical, and remote sensing domains), Weak-label FSS, and Zero-shot Segmentation, setting new state-of-the-arts on 11 benchmarks.
为了更有效地泛化到未见过的领域(类别),大多数Few-shot Segmentation(FSS)方法会直接利用预训练的编码器,并仅在当前大模型时代微调解码器,特别是在这种情况下。然而,这样的预训练特征编码器往往不关注目标类别,从而激活与目标类别无关的对象。相比之下,人类可以轻松地关注当前视野中的特定对象。本文模仿了人类的视觉感知模式,并提出了一个新颖且强大的提示驱动方案,称为“提示和传递”(PAT),以构建动态分类感知提示范式,将编码器聚焦于当前任务中感兴趣的对象(目标类别)。 以下是三个关键点以提高提示: 1)跨模态语言信息被引入以初始化每个任务的提示。 2)语义提示转移(SPT)将图像中的类特定语义准确地传递到提示中。 3)部分掩码生成器(PMG)与SPT协同工作,为不同个体生成不同但互补的部分提示。 令人惊讶的是,PAT在包括标准FSS、跨领域FSS(如CV、医疗和遥感领域)、弱标签FSS和零散分割在内的四个任务上实现了竞争性的性能,将新状态置于11个基准上。
https://arxiv.org/abs/2409.10389