In recent years, deep learning based on Convolutional Neural Networks (CNNs) has achieved remarkable success in many applications. However, their heavy reliance on extensive labeled data and limited generalization ability to unseen classes pose challenges to their suitability for medical image processing tasks. Few-shot learning, which utilizes a small amount of labeled data to generalize to unseen classes, has emerged as a critical research area, attracting substantial attention. Currently, most studies employ a prototype-based approach, in which prototypical networks are used to construct prototypes from the support set, guiding the processing of the query set to obtain the final results. While effective, this approach heavily relies on the support set while neglecting the query set, resulting in notable disparities within the model classes. To mitigate this drawback, we propose a novel Support-Query Prototype Fusion Network (SQPFNet). SQPFNet initially generates several support prototypes for the foreground areas of the support images, thus producing a coarse segmentation mask. Subsequently, a query prototype is constructed based on the coarse segmentation mask, additionally exploiting pattern information in the query set. Thus, SQPFNet constructs high-quality support-query fused prototypes, upon which the query image is segmented to obtain the final refined query mask. Evaluation results on two public datasets, SABS and CMR, show that SQPFNet achieves state-of-the-art performance.
近年来,基于卷积神经网络(CNNs)的深度学习在许多应用领域取得了显著的成功。然而,它们对大量标记数据的高度依赖和对于未见过的类别的有限泛化能力,使得它们在医学图像处理任务上并不适用。少量样本学习,利用少量的标记数据推广到未见过的类别,成为一个关键的研究领域,吸引了大量关注。目前,大多数研究采用基于原型的方法,其中原型网络用于从支持集构建原型,指导查询集的加工以获得最终结果。虽然有效,但这种方法在支持集上过于依赖,而忽略了查询集,导致模型类之间的差异显著。为了减轻这一缺点,我们提出了一个新的支持-查询原型融合网络(SQPFNet)。 SQPFNet首先为支持图像的前景区域生成几个支持原型,从而产生粗分割掩码。接着,基于粗分割掩码构建查询原型,并利用查询集中的模式信息。因此,SQPFNet构建了高质量的支持-查询融合原型,在查询图像上进行分割,以获得最终精化的查询掩码。在两个公开数据集SABS和CMR上的评估结果表明,SQPFNet实现了最先进的性能。
https://arxiv.org/abs/2405.07516
Linking (aligning) biomedical concepts across diverse data sources enables various integrative analyses, but it is challenging due to the discrepancies in concept naming conventions. Various strategies have been developed to overcome this challenge, such as those based on string-matching rules, manually crafted thesauri, and machine learning models. However, these methods are constrained by limited prior biomedical knowledge and can hardly generalize beyond the limited amounts of rules, thesauri, or training samples. Recently, large language models (LLMs) have exhibited impressive results in diverse biomedical NLP tasks due to their unprecedentedly rich prior knowledge and strong zero-shot prediction abilities. However, LLMs suffer from issues including high costs, limited context length, and unreliable predictions. In this research, we propose PromptLink, a novel biomedical concept linking framework that leverages LLMs. It first employs a biomedical-specialized pre-trained language model to generate candidate concepts that can fit in the LLM context windows. Then it utilizes an LLM to link concepts through two-stage prompts, where the first-stage prompt aims to elicit the biomedical prior knowledge from the LLM for the concept linking task and the second-stage prompt enforces the LLM to reflect on its own predictions to further enhance their reliability. Empirical results on the concept linking task between two EHR datasets and an external biomedical KG demonstrate the effectiveness of PromptLink. Furthermore, PromptLink is a generic framework without reliance on additional prior knowledge, context, or training data, making it well-suited for concept linking across various types of data sources. The source code is available at this https URL.
跨多个数据源链接生物医学概念,可以进行各种整合分析,但这是具有挑战性的,因为概念命名惯例的差异。已经开发了许多策略来克服这一挑战,例如基于字符匹配规则的方法、人工制作的同构词和机器学习模型。然而,这些方法都受到先验生物医学知识的限制,并且很难将它们扩展到有限规则、同构词或训练样本的数量。最近,大型语言模型(LLMs)在各种生物医学自然语言处理任务中表现出色,因为它们具有史无前例的丰富先验知识和完善零散预测能力。然而,LLMs也存在一些问题,包括高昂的成本、有限上下文长度和不可靠的预测。在这项研究中,我们提出了PromptLink,一种新颖的生物医学概念链接框架,用于利用LLM。它首先使用专门针对生物医学领域的预训练语言模型生成候选概念,使其适应LLM上下文窗口。然后,它利用LLM通过两阶段提示进行概念链接,第一阶段提示旨在从LLM中激发概念链接任务的生物医学先验知识,第二阶段提示则要求LLM考虑其自身的预测以进一步增强其可靠性。在两个电子病历数据集和一个外部生物医学知识库之间的概念链接任务之间的实证结果证明了PromptLink的有效性。此外,PromptLink是一个通用的框架,不依赖于其他先验知识、上下文或训练数据,因此非常适合各种数据源的概念链接。源代码可在此https URL中获取。
https://arxiv.org/abs/2405.07500
Large language models (LLMs) have emerged as powerful tools with transformative potential across numerous domains, including healthcare and medicine. In the medical domain, LLMs hold promise for tasks ranging from clinical decision support to patient education. However, evaluating the performance of LLMs in medical contexts presents unique challenges due to the complex and critical nature of medical information. This paper provides a comprehensive overview of the landscape of medical LLM evaluation, synthesizing insights from existing studies and highlighting evaluation data sources, task scenarios, and evaluation methods. Additionally, it identifies key challenges and opportunities in medical LLM evaluation, emphasizing the need for continued research and innovation to ensure the responsible integration of LLMs into clinical practice.
大型语言模型(LLMs)在医疗领域展现出巨大的变革潜力,跨越多个领域,包括医疗保健和医学。在医疗领域,LLMs在临床决策支持、患者教育等方面具有潜力。然而,在医疗背景下评估LLM的表现带来了独特的挑战,因为医疗信息的复杂性和关键性。本文全面回顾了医疗LLM评估领域的现状,综合了现有研究的见解,并突出了评估数据来源、任务场景和评估方法。此外,它还识别出医疗LLM评估中的关键挑战和机遇,强调需要持续研究和创新,以确保将LLM responsible地融入临床实践中。
https://arxiv.org/abs/2405.07468
Developing accurate machine learning models for oncology requires large-scale, high-quality multimodal datasets. However, creating such datasets remains challenging due to the complexity and heterogeneity of medical data. To address this challenge, we introduce HoneyBee, a scalable modular framework for building multimodal oncology datasets that leverages foundational models to generate representative embeddings. HoneyBee integrates various data modalities, including clinical records, imaging data, and patient outcomes. It employs data preprocessing techniques and transformer-based architectures to generate embeddings that capture the essential features and relationships within the raw medical data. The generated embeddings are stored in a structured format using Hugging Face datasets and PyTorch dataloaders for accessibility. Vector databases enable efficient querying and retrieval for machine learning applications. We demonstrate the effectiveness of HoneyBee through experiments assessing the quality and representativeness of the embeddings. The framework is designed to be extensible to other medical domains and aims to accelerate oncology research by providing high-quality, machine learning-ready datasets. HoneyBee is an ongoing open-source effort, and the code, datasets, and models are available at the project repository.
开发准确的肿瘤学机器学习模型需要大规模、高质量的多模态数据集。然而,由于医疗数据的复杂性和多样性,创建这样的数据集仍然具有挑战性。为解决这个问题,我们引入了HoneyBee,一个可扩展的模块框架,用于构建多模态肿瘤学数据集,利用基本模型生成具有代表性的嵌入。HoneyBee集成了各种数据模式,包括临床记录、影像数据和患者结局。它采用数据预处理技术和Transformer架构来生成嵌入,以捕捉原始医疗数据中的关键特征和关系。生成的嵌入使用Hugging Face数据集和PyTorch数据加载器以结构化格式进行存储,便于机器学习应用。向量数据库使机器学习应用的查询和检索高效。通过评估HoneyBee的质量和代表性,我们证明了其有效性。该框架旨在扩展到其他医疗领域,通过提供高质量、机器学习友好型的数据集来加速肿瘤学研究。HoneyBee是一个正在进行的开源努力,代码、数据集和模型可通过项目存储库获取。
https://arxiv.org/abs/2405.07460
This study presents a novel methodology utilizing a pre-trained speech recognition model for processing respiratory sound data. By incorporating medical record information, we introduce an innovative multi-modal deep-learning architecture, named Rene, which addresses the challenges of poor interpretability and underperformance in real-time clinical diagnostic response observed in previous respiratory disease-focused models. The proposed Rene architecture demonstrated significant improvements of 10.24%, 16.15%, 15.29%, and 18.90% respectively, compared to the baseline across four tasks related to respiratory event detection and audio record classification on the SPRSound database. In patient disease prediction tests on the ICBHI database, the architecture exhibited improvements of 23% in the mean of average score and harmonic score compared to the baseline. Furthermore, we developed a real-time respiratory sound discrimination system based on the Rene architecture, featuring a dual-thread design and compressed model parameters for simultaneous microphone recording and real-time dynamic decoding. Employing state-of-the-art Edge AI technology, this system enables rapid and accurate responses for respiratory sound auscultation, facilitating deployment on wearable clinical detection devices to capture incremental data, which can be synergistically evolved with large-scale models deployed on cloud servers for downstream tasks.
这项研究提出了一种利用预训练语音识别模型处理呼吸音数据的新型方法。通过结合病历信息,我们引入了一种名为Rene的创新多模态深度学习架构,该架构解决了之前呼吸疾病关注模型中观察到的实时临床诊断反应差和性能不足的问题。所提出的Rene架构在SPRSound数据库上的四个与呼吸事件检测和音频记录分类相关的任务中的基线相比,分别显示出10.24%、16.15%、15.29%和18.90%的显著改善。在ICBHI数据库上的患者疾病预测测试中,该架构显示出基线上的平均分数和和谐分数的改善率为23%。此外,我们还基于Rene架构开发了一个实时呼吸音识别系统,具有双线程设计和压缩模型参数的实时动态解码功能。通过采用最先进的边缘人工智能技术,该系统能够快速且准确地响应呼吸音听诊,促进在可穿戴式临床检测设备上部署,以捕捉连续数据,并与在云服务器上部署的大规模模型协同进化,实现下游任务的协同进化。
https://arxiv.org/abs/2405.07442
Medical images are often more difficult to acquire than natural images due to the specialism of the equipment and technology, which leads to less medical image datasets. So it is hard to train a strong pretrained medical vision model. How to make the best of natural pretrained vision model and adapt in medical domain still pends. For image classification, a popular method is linear probe (LP). However, LP only considers the output after feature extraction. Yet, there exists a gap between input medical images and natural pretrained vision model. We introduce visual prompting (VP) to fill in the gap, and analyze the strategies of coupling between LP and VP. We design a joint learning loss function containing categorisation loss and discrepancy loss, which describe the variance of prompted and plain images, naming this joint training strategy MoVL (Mixture of Visual Prompting and Linear Probe). We experiment on 4 medical image classification datasets, with two mainstream architectures, ResNet and CLIP. Results shows that without changing the parameters and architecture of backbone model and with less parameters, there is potential for MoVL to achieve full finetune (FF) accuracy (on four medical datasets, average 90.91% for MoVL and 91.13% for FF). On out of distribution medical dataset, our method(90.33%) can outperform FF (85.15%) with absolute 5.18 % lead.
医学图像通常比自然图像更难以获取,因为设备的特殊性和技术的专业性导致医疗图像数据集较少。因此,很难训练一个强大的预训练医学视觉模型。如何充分利用自然预训练视觉模型并在医学领域进行适应仍然是一个悬而未决的问题。对于图像分类,一种流行的方法是线性探测(LP)。然而,LP仅考虑特征提取后的输出。然而,输入医疗图像与自然预训练视觉模型之间存在差距。我们引入了视觉提示(VP)来填补这个差距,并分析了LP和VP之间的耦合策略。我们设计了一个包含分类损失和差异损失的联合训练损失函数,描述了提示和普通图像的方差,将这种联合训练策略称为MoVL(混合视觉提示和线性探测)。我们在4个医学图像分类数据集上进行实验,包括主流架构ResNet和CLIP。结果表明,在不改变骨干模型的参数和架构的情况下,MoVL具有实现完全微调(FF)准确度的潜力(在四个医学数据集上,平均90.91% for MoVL和91.13% for FF)。然而,在离散分布的医学数据集上,我们的方法(90.33%)可以击败FF(85.15%),绝对误差比为5.18%。
https://arxiv.org/abs/2405.07411
Motivation: Drug repurposing is a viable solution for reducing the time and cost associated with drug development. However, thus far, the proposed drug repurposing approaches still need to meet expectations. Therefore, it is crucial to offer a systematic approach for drug repurposing to achieve cost savings and enhance human lives. In recent years, using biological network-based methods for drug repurposing has generated promising results. Nevertheless, these methods have limitations. Primarily, the scope of these methods is generally limited concerning the size and variety of data they can effectively handle. Another issue arises from the treatment of heterogeneous data, which needs to be addressed or converted into homogeneous data, leading to a loss of information. A significant drawback is that most of these approaches lack end-to-end functionality, necessitating manual implementation and expert knowledge in certain stages. Results: We propose a new solution, HGTDR (Heterogeneous Graph Transformer for Drug Repurposing), to address the challenges associated with drug repurposing. HGTDR is a three-step approach for knowledge graph-based drug re-purposing: 1) constructing a heterogeneous knowledge graph, 2) utilizing a heterogeneous graph transformer network, and 3) computing relationship scores using a fully connected network. By leveraging HGTDR, users gain the ability to manipulate input graphs, extract information from diverse entities, and obtain their desired output. In the evaluation step, we demonstrate that HGTDR performs comparably to previous methods. Furthermore, we review medical studies to validate our method's top ten drug repurposing suggestions, which have exhibited promising results. We also demon-strated HGTDR's capability to predict other types of relations through numerical and experimental validation, such as drug-protein and disease-protein inter-relations.
动机:药物再利用是减少与药物研发相关的时间和成本的有效解决方案。然而,迄今为止,所提出的药物再利用方法仍需要满足预期。因此,为达到节省成本和提高人类生命的目标,有必要为药物再利用提供系统方法。近年来,基于生物网络的药物再利用方法已经产生了积极的结果。然而,这些方法存在局限性。首先,这些方法的适用范围通常仅限于他们可以有效地处理的数据规模和多样性。另一个问题源于处理异质数据,需要处理或将其转换为同质数据,导致信息丢失。一个显著的缺点是,大多数这些方法缺乏端到端的功能,需要手动实施并在某些阶段专家知识。结果:我们提出了一个新的解决方案,HGTDR(异质知识图Transformer for Drug Repurposing),以应对与药物再利用相关的挑战。HGTDR是基于知识图的药物再利用的三步方法:1)构建异质知识图,2)使用异质图转
https://arxiv.org/abs/2405.08031
We present MedConceptsQA, a dedicated open source benchmark for medical concepts question answering. The benchmark comprises of questions of various medical concepts across different vocabularies: diagnoses, procedures, and drugs. The questions are categorized into three levels of difficulty: easy, medium, and hard. We conducted evaluations of the benchmark using various Large Language Models. Our findings show that pre-trained clinical Large Language Models achieved accuracy levels close to random guessing on this benchmark, despite being pre-trained on medical data. However, GPT-4 achieves an absolute average improvement of nearly 27%-37% (27% for zero-shot learning and 37% for few-shot learning) when compared to clinical Large Language Models. Our benchmark serves as a valuable resource for evaluating the understanding and reasoning of medical concepts by Large Language Models. Our benchmark is available at this https URL
我们提出了MedConceptsQA,一个专用的开源医疗概念问题回答基准。基准包括来自不同词汇表的医疗概念问题的各种类型:诊断、程序和药物。问题分为三个难度级别:容易、中难和难。我们使用各种大型语言模型对基准进行评估。我们的研究结果表明,尽管预先训练的临床大型语言模型在医疗数据上进行预训练,但它们在基准上的准确性接近于随机猜测。然而,GPT-4在临床大型语言模型上实现了几乎27%-37%(27%的零散学习百分比和37%的少数学习百分比)的绝对平均改善。我们的基准为评估大型语言模型对医疗概念的理解和推理提供了有价值的资源。基准可以在https:// this URL
https://arxiv.org/abs/2405.07348
Semi-supervised medical image segmentation has gained growing interest due to its ability to utilize unannotated data. The current state-of-the-art methods mostly rely on pseudo-labeling within a co-training framework. These methods depend on a single pseudo-label for training, but these labels are not as accurate as the ground truth of labeled data. Relying solely on one pseudo-label often results in suboptimal results. To this end, we propose a novel approach where multiple pseudo-labels for the same unannotated image are used to learn from the unlabeled data: the conventional fixed pseudo-label and the newly introduced dynamic pseudo-label. By incorporating multiple pseudo-labels for the same unannotated image into the co-training framework, our approach provides a more robust training approach that improves model performance and generalization capabilities. We validate our novel approach on three semi-supervised medical benchmark segmentation datasets, the Left Atrium dataset, the Pancreas-CT dataset, and the Brats-2019 dataset. Our approach significantly outperforms state-of-the-art methods over multiple medical benchmark segmentation datasets with different labeled data ratios. We also present several ablation experiments to demonstrate the effectiveness of various components used in our approach.
半监督医学图像分割由于能够利用未标注数据而受到了越来越多的关注。目前最先进的方法主要依赖于在协同训练框架内的伪标签。这些方法仅依赖于单一伪标签进行训练,但这些标签并不像标注数据的真实标签准确。仅依赖一个伪标签往往导致次优结果。因此,我们提出了一个新方法,在同一未标注图像中使用多个伪标签进行学习:传统的固定伪标签和新引入的动态伪标签。通过将同一未标注图像中的多个伪标签纳入协同训练框架,我们的方法提供了一种更健壮的训练方法,能提高模型性能和泛化能力。我们在三个半监督医疗图像分割数据集上进行验证:左心房数据集、胰腺CT数据集和布雷特2019数据集。我们的方法在多个医疗图像分割数据集上的性能明显优于最先进的方法。我们还进行了几个消融实验来展示我们方法中使用的各种组件的有效性。
https://arxiv.org/abs/2405.07256
Despite the rapid advancement in the field of image recognition, the processing of high-resolution imagery remains a computational challenge. However, this processing is pivotal for extracting detailed object insights in areas ranging from autonomous vehicle navigation to medical imaging analyses. Our study introduces a framework aimed at mitigating these challenges by leveraging memory efficient patch based processing for high resolution images. It incorporates a global context representation alongside local patch information, enabling a comprehensive understanding of the image content. In contrast to traditional training methods which are limited by memory constraints, our method enables training of ultra high resolution images. We demonstrate the effectiveness of our method through superior performance on 7 different benchmarks across classification, object detection, and segmentation. Notably, the proposed method achieves strong performance even on resource-constrained devices like Jetson Nano. Our code is available at this https URL.
尽管在图像识别领域,进展迅速,但处理高分辨率图像仍然是一个计算上的挑战。然而,这种处理对于从自动驾驶到医学影像分析等各个领域提取详细的目标洞察至关重要。我们的研究旨在通过利用基于记忆高效的图像补丁处理来减轻这些挑战。它结合了全局上下文表示和局部补丁信息,实现了对图像内容的全面理解。与传统训练方法受到内存限制不同,我们的方法可以训练超高清图像。我们通过在分类、目标检测和分割等7个不同的基准测试中取得卓越的性能,证明了我们方法的优越性。值得注意的是,与资源受限的设备(如Jetson Nano)相比,所提出的方法在性能上同样具有优势。我们的代码可以从以下链接获取。
https://arxiv.org/abs/2405.07166
Semi-supervised anomaly detection for guaranteeing the reliability of intelligent systems has received increasing attention. However, existing methods rely too much on data correlation and neglect causality, which can be misleading due to confounding factors and affect system reliability. Additionally, the current reinforcement learning anomaly detection methods can effectively identify known and unknown anomalies in environments with limited labeled samples. Despite its effectiveness, these methods still face several challenges, such as under-utilization of priori knowledge, lack of model flexibility, and insufficient reward feedback when interacting with the environment. To address the above problems, this paper innovatively constructs a counterfactual causal reinforcement learning model, termed Triple-Assisted Causal Reinforcement Learning Anomaly Detector (Tri-CRLAD). The model utilizes the causal inference mechanism to radically improve the performance of semi-supervised models and enhance the model's ability to uncover anomaly data in the face of unknown or rare data. In addition, Tri-CRLAD features a triple decision support mechanism, namely, a sampling strategy based on historical similarity, an adaptive threshold smoothing adjustment strategy, and an adaptive decision reward mechanism. These mechanisms further enhance the flexibility and generalization ability of the model, enabling it to effectively respond to various complex and dynamically changing environments. Finally, Tri-CRLAD matches or exceeds the performance of 9 baseline methods across 7 diverse intelligent system datasets, including satellite systems, medical systems, and health systems. Moreover, anomaly detection stability was significantly improved by up to 23\% with an extremely small number of known anomaly samples. Our code is available at this https URL
半监督异常检测保证智能系统的可靠性已经引起越来越多的关注。然而,现有的方法过于依赖数据相关性,并忽视了因果关系,这可能会因混淆因素而误导,并影响系统的可靠性。此外,当前的强化学习异常检测方法可以有效地在具有有限标注样本的环境中识别已知和未知异常。尽管这些方法的有效性得到了提高,但它们仍然面临几个挑战,例如先验知识的利用率低,缺乏模型灵活性,以及在与环境交互时缺乏奖励反馈。为解决这些问题,本文创新地构建了一种名为Tri-Assisted Causal Reinforcement Learning Anomaly Detector(Tri-CRLAD)的反事实因果强化学习模型。该模型利用因果推理机制大幅提高半监督模型的性能,并增强模型在未知或稀有数据面前发现异常数据的能力。此外,Tri-CRLAD还具有三重决策支持机制,包括基于历史相似的采样策略、自适应阈值平滑调整策略和自适应决策奖励机制。这些机制进一步增强了模型的灵活性和泛化能力,使模型能够有效应对各种复杂和动态变化的场景。最后,Tri-CRLAD在包括卫星系统、医疗系统和健康系统在内的7个不同智能系统数据集上的性能与9个基线方法相匹敌或超过。此外,通过极其少量的已知异常样本,异常检测的稳定性显著提高了23%。我们的代码可在此处访问:https://www.xxxxxx.com
https://arxiv.org/abs/2405.06925
An efficient and effective decoding mechanism is crucial in medical image segmentation, especially in scenarios with limited computational resources. However, these decoding mechanisms usually come with high computational costs. To address this concern, we introduce EMCAD, a new efficient multi-scale convolutional attention decoder, designed to optimize both performance and computational efficiency. EMCAD leverages a unique multi-scale depth-wise convolution block, significantly enhancing feature maps through multi-scale convolutions. EMCAD also employs channel, spatial, and grouped (large-kernel) gated attention mechanisms, which are highly effective at capturing intricate spatial relationships while focusing on salient regions. By employing group and depth-wise convolution, EMCAD is very efficient and scales well (e.g., only 1.91M parameters and 0.381G FLOPs are needed when using a standard encoder). Our rigorous evaluations across 12 datasets that belong to six medical image segmentation tasks reveal that EMCAD achieves state-of-the-art (SOTA) performance with 79.4% and 80.3% reduction in #Params and #FLOPs, respectively. Moreover, EMCAD's adaptability to different encoders and versatility across segmentation tasks further establish EMCAD as a promising tool, advancing the field towards more efficient and accurate medical image analysis. Our implementation is available at this https URL.
高效的有效的解码机制在医学图像分割中至关重要,尤其是在计算资源有限的情况下。然而,这些解码机制通常伴随着高昂的计算成本。为了应对这一担忧,我们引入了EMCAD,一种新型高效多尺度卷积注意解码器,旨在同时提高性能和计算效率。EMCAD利用独特的多尺度深度卷积模块,通过多尺度卷积显著增强特征图。EMCAD还采用通道、空间和分组(大核)卷积注意力机制,这些机制在捕捉复杂的空间关系的同时,专注于突出区域。通过采用分组和深度卷积,EMCAD非常高效,并且具有良好的扩展性(例如,使用标准编码器时,只需1.91M参数和0.381G FLOPs)。我们在六个医学图像分割任务上进行严格的评估发现,EMCAD在分别实现最佳性能(SOTA)和最佳计算效率(CPU效率和GFLOP效率)方面取得了显著优势。此外,EMCAD对不同编码器具有适应性,在分割任务上的多样性进一步证明EMCAD是一种有前景的工具,促进了该领域的更高效和精确的医学图像分析。我们的实现可以从以下链接获得:https://www.emcad.cn/。
https://arxiv.org/abs/2405.06880
Deep neural networks (DNNs) have been used to create models for many complex analysis problems like image recognition and medical diagnosis. DNNs are a popular tool within machine learning due to their ability to model complex patterns and distributions. However, the performance of these networks is highly dependent on the quality of the data used to train the models. Two characteristics of these sets, noisy labels and training set biases, are known to frequently cause poor generalization performance as a result of overfitting to the training set. This paper aims to solve this problem using the approach proposed by Ren et al. (2018) using meta-training and online weight approximation. We will first implement a toy-problem to crudely verify the claims made by the authors of Ren et al. (2018) and then venture into using the approach to solve a real world problem of Skin-cancer detection using an imbalanced image dataset.
深度神经网络(DNNs)已经用于许多复杂的分析问题,如图像识别和医学诊断。 DNNs 在机器学习领域因能够建模复杂模式和分布而成为一种流行的工具。然而,这些网络的性能高度依赖于用于训练模型的数据的质量。这两组数据的两个特点,即噪声标签和训练集偏差,已经被认为是导致过拟合训练集而导致泛化性能差的主要原因。本文旨在使用Ren等人(2018)提出的元训练和在线权重近似的方案来解决这个问题。首先,我们将实现一个简单的玩具问题,以粗略验证Ren等人(2018)所提出的观点,然后将使用该方法解决一个真实世界的问题——使用不平衡图像数据集检测皮肤癌。
https://arxiv.org/abs/2405.06859
Federated learning is widely used in medical applications for training global models without needing local data access. However, varying computational capabilities and network architectures (system heterogeneity), across clients pose significant challenges in effectively aggregating information from non-independently and identically distributed (non-IID) data. Current federated learning methods using knowledge distillation require public datasets, raising privacy and data collection issues. Additionally, these datasets require additional local computing and storage resources, which is a burden for medical institutions with limited hardware conditions. In this paper, we introduce a novel federated learning paradigm, named Model Heterogeneous personalized Federated Learning via Injection and Distillation (MH-pFLID). Our framework leverages a lightweight messenger model that carries concentrated information to collect the information from each client. We also develop a set of receiver and transmitter modules to receive and send information from the messenger model, so that the information could be injected and distilled with efficiency.
联邦学习在医疗应用中广泛用于训练全局模型,而无需访问本地数据。然而,随着计算能力和网络架构的异构性(系统异质性),来自客户端的各个系统面临着在非独立且等距分布(非IID)数据中有效汇总信息的重大挑战。当前使用知识蒸馏的联邦学习方法需要公共数据集,这引发了隐私和数据收集问题。此外,这些数据集还需要额外的本地计算和存储资源,这对在硬件条件有限的医疗机构中执行联邦学习造成了负担。在本文中,我们提出了一个名为Model Heterogeneous personalized Federated Learning(MH-pFLID)的新联邦学习范式。我们的框架利用了轻量级的信使模型,该模型携带集中信息,从每个客户端收集信息。我们还开发了一组接收器和传输器模块来接收和发送信使模型中的信息,以便实现高效的信息注入和蒸馏。
https://arxiv.org/abs/2405.06822
The circular economy paradigm is gaining interest as a solution to reduce both material supply uncertainties and waste generation. One of the main challenges is monitoring materials, since in general, something that is not measured cannot be effectively managed. In this paper, we propose real-time synchronized object detection to enable, at the same time, autonomous sorting, mapping, and quantification of end-of-life medical materials. Dataset, code, and demo videos are publicly available.
循环经济范式作为一种解决减少材料供应不确定性和废物产生的方法而受到关注。一个主要挑战是监测材料,因为通常无法测量的东西,就无法有效地管理。在本文中,我们提出了实时同步物体检测,以实现同时进行自主分类、映射和量化报废医疗材料。数据集、代码和演示视频都可以公开使用。
https://arxiv.org/abs/2405.06821
Intracerebral hemorrhage (ICH) is a severe and sudden medical condition caused by the rupture of blood vessels in the brain, leading to permanent damage to brain tissue and often resulting in functional disabilities or death in patients. Diagnosis and analysis of ICH typically rely on brain CT imaging. Given the urgency of ICH conditions, early treatment is crucial, necessitating rapid analysis of CT images to formulate tailored treatment plans. However, the complexity of ICH CT images and the frequent scarcity of specialist radiologists pose significant challenges. Therefore, we built a dataset for ICH and normal classification and three types of ICH image classification based on the hemorrhage location, i.e., Deep, Subcortical, and Lobar. In addition, we propose a dual-task vision transformer (DTViT) for the automated classification and diagnosis of ICH images. This neural network utilizes the encoder from ViT, employing attention mechanisms for feature extraction from CT images. We incorporated two multilayer perception (MLP)-based decoders within the network to simultaneously identify the presence of ICH and classify three types of hemorrhage locations. Experimental results demonstrate that our proposed multi-classification network performs well on the built real-world test dataset. The code and dataset for this study will be made publicly available upon paper acceptance at: this https URL.
颅内出血(ICH)是一种严重的急性医疗状况,由于脑部血管的破裂导致脑组织永久性损伤,通常会导致功能残疾或死亡。诊断和分析ICH通常依赖于脑部CT成像。鉴于ICH病情的紧迫性,早期治疗至关重要,需要快速分析CT图像以制定个性化的治疗计划。然而,ICH CT图像的复杂性和经常缺乏专门的放射科医生带来的挑战,使得这个任务非常具有挑战性。因此,我们为ICH和正常分类创建了一个数据集,并基于出血位置分为三种类型,即深部、亚脑室和叶状。此外,我们提出了一个用于自动分类和诊断ICH图像的双任务视觉Transformer(DTViT)。这个神经网络利用ViT的编码器,并采用注意力机制从CT图像中提取特征。我们通过网络中的两个多层感知(MLP)解码器,同时识别ICH和分类三种出血类型。实验结果表明,我们在构建的现实世界测试数据集上的多分类网络表现良好。本研究的代码和数据集将在论文接受后公开发布在:https:// this URL。
https://arxiv.org/abs/2405.06814
Patient hand-off and triage are two fundamental problems in health care. Often doctors must painstakingly summarize complex findings to efficiently communicate with specialists and quickly make decisions on which patients have the most urgent cases. In pursuit of these challenges, we present (1) a model with state-of-art radiology report summarization performance using (2) a novel method for augmenting medical data, and (3) an analysis of the model limitations and radiology knowledge gain. We also provide a data processing pipeline for future models developed on the the MIMIC CXR dataset. Our best performing model was a fine-tuned BERT-to-BERT encoder-decoder with 58.75/100 ROUGE-L F1, which outperformed specialized checkpoints with more sophisticated attention mechanisms. We investigate these aspects in this work.
患者交接班和分诊是医疗保健中的两个基本问题。通常,医生必须费力地总结复杂的放射学报告以有效地与专家沟通并尽快做出关于病情最紧迫的患者的决策。为了解决这些挑战,我们提出了一个使用最先进的放射学报告总结性能的模型,该模型使用了一种新颖的方法来增强医疗数据,并分析了模型的局限性和放射学知识获取。我们还提供了基于MIMIC CXR数据集未来模型的数据处理管道。我们表现最好的模型是一款经过微调的BERT-to-BERT编码器-解码器,其ROUGE-L分数为58.75/100,优于具有更复杂注意机制的专业检查点。我们在本研究中调查了这些方面。
https://arxiv.org/abs/2405.06802
Denoising diffusion models (DDM) have gained recent traction in medical image translation given improved training stability over adversarial models. DDMs learn a multi-step denoising transformation to progressively map random Gaussian-noise images onto target-modality images, while receiving stationary guidance from source-modality images. As this denoising transformation diverges significantly from the task-relevant source-to-target transformation, DDMs can suffer from weak source-modality guidance. Here, we propose a novel self-consistent recursive diffusion bridge (SelfRDB) for improved performance in medical image translation. Unlike DDMs, SelfRDB employs a novel forward process with start- and end-points defined based on target and source images, respectively. Intermediate image samples across the process are expressed via a normal distribution with mean taken as a convex combination of start-end points, and variance from additive noise. Unlike regular diffusion bridges that prescribe zero variance at start-end points and high variance at mid-point of the process, we propose a novel noise scheduling with monotonically increasing variance towards the end-point in order to boost generalization performance and facilitate information transfer between the two modalities. To further enhance sampling accuracy in each reverse step, we propose a novel sampling procedure where the network recursively generates a transient-estimate of the target image until convergence onto a self-consistent solution. Comprehensive analyses in multi-contrast MRI and MRI-CT translation indicate that SelfRDB offers superior performance against competing methods.
滤波扩散模型(DDM)在医学图像翻译领域最近受到了关注,因为通过改善对抗模型的训练稳定性,DDM获得了更好的性能。DDM通过学习多步滤波变换来逐步将随机高斯噪声图像映射到目标模态图像,同时从源模态图像中获得稳态指导。然而,由于这个滤波变换与任务相关的源到目标变换存在显著差异,DDM可能会受到弱源模态指导的困扰。 在这里,我们提出了一个名为自适应一致递归扩散桥(SelfRDB)的新型医疗图像翻译模型,以提高在医学图像翻译方面的性能。与DDM不同,SelfRDB采用了一种基于目标和源图像分别定义起点和终点的全新前向过程。过程中的中间图像样本通过正态分布来表示,平均取作为凸组合的起点和终点,方差从添加噪声中取。与常规扩散桥规定在起点和终点处方差为零,过程中间的方差较高的情况不同,我们提出了一个在端点处方差逐渐增加的新噪音调度策略,以提高泛化性能并促进两种模态之间的信息传递。 为了进一步提高反向步骤的抽样精度,我们提出了一个新抽样方法,网络在收敛到自适应解之前递归生成目标图像的暂态估计。在多对比MRI和MRI-CT翻译中进行全面的分析表明,SelfRDB相对于竞争方法具有卓越的性能。
https://arxiv.org/abs/2405.06789
We introduce SAM3D, a new approach to semi-automatic zero-shot segmentation of 3D images building on the existing Segment Anything Model. We achieve fast and accurate segmentations in 3D images with a four-step strategy comprising: volume slicing along non-orthogonal axes, efficient prompting in 3D, slice-wise inference using the pretrained SAM, and recoposition and refinement in 3D. We evaluated SAM3D performance qualitatively on an array of imaging modalities and anatomical structures and quantify performance for specific organs in body CT and tumors in brain MRI. By enabling users to create 3D segmentations of unseen data quickly and with dramatically reduced manual input, these methods have the potential to aid surgical planning and education, diagnostic imaging, and scientific research.
我们介绍了一种新的半自动零 shot分割3D图像的方法,基于现有的Segment Anything模型。通过采用四步策略,我们可以在3D图像中实现快速和准确的分割,包括:沿着非平行轴进行体积切片,3D中的高效提示,使用预训练的SAM进行切片的推理,以及3D中的复位和优化。我们在一系列图像模态和解剖结构上对SAM3D的性能进行了定性评估,并对特定器官的脑部CT和肿瘤的MRI进行了定量评估。通过允许用户快速创建未见数据的3D分割,并显著减少了手动输入,这些方法具有帮助手术规划和教育,诊断影像和科学研究等潜在价值。
https://arxiv.org/abs/2405.06786
As Transformers have become state-of-the-art models for natural language processing (NLP) tasks, the need to understand and explain their predictions is increasingly apparent. Especially in unsupervised applications, such as information retrieval tasks, similarity models built on top of foundation model representations have been widely applied. However, their inner prediction mechanisms have mostly remained opaque. Recent advances in explainable AI have made it possible to mitigate these limitations by leveraging improved explanations for Transformers through layer-wise relevance propagation (LRP). Using BiLRP, an extension developed for computing second-order explanations in bilinear similarity models, we investigate which feature interactions drive similarity in NLP models. We validate the resulting explanations and demonstrate their utility in three corpus-level use cases, analyzing grammatical interactions, multilingual semantics, and biomedical text retrieval. Our findings contribute to a deeper understanding of different semantic similarity tasks and models, highlighting how novel explainable AI methods enable in-depth analyses and corpus-level insights.
随着Transformer模型成为自然语言处理(NLP)任务的顶级模型,理解并解释其预测的需求 increasingly凸显。尤其是在无监督应用中,例如信息检索任务,基于基础模型表示的相似度模型已经得到了广泛应用。然而,它们内部的预测机制仍然大多保持透明。近年来,可解释人工智能(Explainable AI)的进步使得通过逐层相关传播(LRP)来改善Transformer的解释成为可能。通过BiLRP,一个为计算二阶解释而在二元相似性模型中开发的扩展,我们研究了在NLP模型中导致相似性的特征交互。我们验证了所得的解释,并分析了三个语料库层面的用例,包括语义交互、多语言语义和生物医学文本检索。我们的研究结果为深入理解不同语义相似性任务和模型以及如何利用新颖的可解释AI方法提供了贡献,揭示了这些方法如何实现对语料库层面的深入分析和见解。
https://arxiv.org/abs/2405.06604