Self-supervised contrastive learning has emerged as one of the most successful deep learning paradigms. In this regard, it has seen extensive use in image registration and, more recently, in the particular field of medical image registration. In this work, we propose to test and extend and improve a state-of-the-art framework for color fundus image registration, ConKeD. Using the ConKeD framework we test multiple loss functions, adapting them to the framework and the application domain. Furthermore, we evaluate our models using the standarized benchmark dataset FIRE as well as several datasets that have never been used before for color fundus registration, for which we are releasing the pairing data as well as a standardized evaluation approach. Our work demonstrates state-of-the-art performance across all datasets and metrics demonstrating several advantages over current SOTA color fundus registration methods
自监督对比学习已经成为最成功的深度学习范式之一。在这方面,它在图像配准和更近期的医学图像配准领域看到了广泛的应用。在这项工作中,我们提出了一个用于测试和改进最先进的颜色 fundus 图像配准框架ConKeD的框架。使用ConKeD框架我们测试了多个损失函数,并将其适应框架和应用领域。此外,我们还使用标准化基准数据集FIRE以及之前没有用于颜色 fundus 图像配准的数据集来评估我们的模型。我们的工作在所有数据集和指标上都展示了当前最佳性能,并比当前最佳方法具有几个优势。
https://arxiv.org/abs/2404.16773
Developing generalist foundation model has recently attracted tremendous attention among researchers in the field of AI for Medicine (AI4Medicine). A pivotal insight in developing these models is their reliance on dataset scaling, which emphasizes the requirements on developing open-source medical image datasets that incorporate diverse supervision signals across various imaging modalities. In this paper, we introduce RadGenome-Chest CT, a comprehensive, large-scale, region-guided 3D chest CT interpretation dataset based on CT-RATE. Specifically, we leverage the latest powerful universal segmentation and large language models, to extend the original datasets (over 25,692 non-contrast 3D chest CT volume and reports from 20,000 patients) from the following aspects: (i) organ-level segmentation masks covering 197 categories, which provide intermediate reasoning visual clues for interpretation; (ii) 665 K multi-granularity grounded reports, where each sentence of the report is linked to the corresponding anatomical region of CT volume in the form of a segmentation mask; (iii) 1.3 M grounded VQA pairs, where questions and answers are all linked with reference segmentation masks, enabling models to associate visual evidence with textual explanations. All grounded reports and VQA pairs in the validation set have gone through manual verification to ensure dataset quality. We believe that RadGenome-Chest CT can significantly advance the development of multimodal medical foundation models, by training to generate texts based on given segmentation regions, which is unattainable with previous relevant datasets. We will release all segmentation masks, grounded reports, and VQA pairs to facilitate further research and development in this field.
在人工智能领域(AI4Medicine)的研究者中,开发通用基础模型最近引起了巨大的关注。这些模型的关键在于它们对数据集扩大的依赖,强调开发包含各种成像模式下不同监督信号的开放医疗图像数据集。在本文中,我们介绍了RadGenome-Chest CT,一个基于CT-RATE的全面、大规模、区域指导的3D chest CT解释数据集。具体来说,我们利用最先进的强大通用分割和大型语言模型,从以下方面扩展了原始数据集:(一)覆盖197个类别的器官级别分割掩码,为解释提供中间推理的视觉提示;(二)665K个多粒度 grounded 报告,其中每个报告的句子都与相应的 CT 体积的解剖区域通过分割掩码链接;(三)1.3M个 grounded VQA 对,其中问题及其答案都与参考分割掩码链接,使模型能够将视觉证据与文本解释相关联。所有验证集中的 grounded 报告和 VQA 对都经过手动验证,以确保数据集质量。我们相信,RadGenome-Chest CT 可以通过根据给定分割区域生成文本,从而显著推动多模态医疗基础模型的开发,这是之前相关数据集无法实现的。我们将释放所有分割掩码、 grounded 报告和 VQA 对,以促进该领域进一步的研究和发展。
https://arxiv.org/abs/2404.16754
This report enlists 13 functional conditions cashed out in computational terms that have been argued to be constituent of conscious valenced experience. These are extracted from existing empirical and theoretical literature on, among others, animal sentience, medical disorders, anaesthetics, philosophy, evolution, neuroscience, and artificial intelligence.
本报告列举了13个在计算条件下被认为是构成意识价值体验的必要功能条件,这些条件来自于关于动物感知、疾病、麻醉、哲学、进化、神经科学和人工智能等领域的现有实证和理论文献。
https://arxiv.org/abs/2404.16696
Recently, deep learning-based language models have significantly enhanced text-to-SQL tasks, with promising applications in retrieving patient records within the medical domain. One notable challenge in such applications is discerning unanswerable queries. Through fine-tuning model, we demonstrate the feasibility of converting medical record inquiries into SQL queries. Additionally, we introduce an entropy-based method to identify and filter out unanswerable results. We further enhance result quality by filtering low-confidence SQL through log probability-based distribution, while grammatical and schema errors are mitigated by executing queries on the actual database. We experimentally verified that our method can filter unanswerable questions, which can be widely utilized even when the parameters of the model are not accessible, and that it can be effectively utilized in practice.
近年来,基于深度学习的语言模型在文本到数据库任务上取得了显著的提高,在医疗领域有广泛的应用,如检索病历记录。在这种应用中,一个值得注意的是区分不可回答的问题。通过微调模型,我们证明了将医疗记录查询转换为SQL查询是可能的。此外,我们还引入了一种基于熵的方法来识别和过滤不可回答的结果。通过基于日志概率分布过滤低置信度的SQL,我们进一步提高了结果的质量。通过在实际数据库上执行查询来过滤低置信度的SQL,我们可以缓解语义和模式错误。我们通过实验验证,我们的方法可以过滤不可回答的问题,即使模型的参数不可用,实际应用中也可以广泛使用,而且在实践中取得了良好的效果。
https://arxiv.org/abs/2404.16659
The integration of Large Language Models (LLMs) into healthcare promises to transform medical diagnostics, research, and patient care. Yet, the progression of medical LLMs faces obstacles such as complex training requirements, rigorous evaluation demands, and the dominance of proprietary models that restrict academic exploration. Transparent, comprehensive access to LLM resources is essential for advancing the field, fostering reproducibility, and encouraging innovation in healthcare AI. We present Hippocrates, an open-source LLM framework specifically developed for the medical domain. In stark contrast to previous efforts, it offers unrestricted access to its training datasets, codebase, checkpoints, and evaluation protocols. This open approach is designed to stimulate collaborative research, allowing the community to build upon, refine, and rigorously evaluate medical LLMs within a transparent ecosystem. Also, we introduce Hippo, a family of 7B models tailored for the medical domain, fine-tuned from Mistral and LLaMA2 through continual pre-training, instruction tuning, and reinforcement learning from human and AI feedback. Our models outperform existing open medical LLMs models by a large-margin, even surpassing models with 70B parameters. Through Hippocrates, we aspire to unlock the full potential of LLMs not just to advance medical knowledge and patient care but also to democratize the benefits of AI research in healthcare, making them available across the globe.
将大型语言模型(LLMs)融入医疗保健行业有望彻底改变医疗诊断、研究和患者护理。然而,医疗LLMs的发展面临着一些障碍,如复杂的训练要求、严格的评估需求以及 proprietary模型的主导地位,这些模型限制了学术探索。透明、全面的访问LLM资源对于推动该领域的发展、促进可重复性以及鼓励医疗保健领域的人工创新至关重要。我们推出了Hippocrates,一个专为医疗领域而设计的开源LLM框架。与之前的努力相比,它提供了无限制的访问其训练数据集、代码库、检查点以及评估协议。这种开放方法旨在鼓励协同研究,让社区在透明的生态系统中构建、改进和严格评估医疗LLM。此外,我们还介绍了Hippo家族7B模型,这些模型针对医疗领域进行了微调和优化,通过持续的预训练、指令调整和强化学习从人类和AI反馈中进行微调。我们的模型在现有开放医疗LLM模型的性能优势基础上,性能优势巨大,甚至超过了具有70B参数的模型。通过Hippocrates,我们渴望利用LLMs不仅推动医疗知识和患者护理的发展,还将促进医疗保健领域的人工研究民主化,使它们在全球范围内可用。
https://arxiv.org/abs/2404.16621
Weakly supervised medical image segmentation (MIS) using generative models is crucial for clinical diagnosis. However, the accuracy of the segmentation results is often limited by insufficient supervision and the complex nature of medical imaging. Existing models also only provide a single outcome, which does not allow for the measurement of uncertainty. In this paper, we introduce DiffSeg, a segmentation model for skin lesions based on diffusion difference which exploits diffusion model principles to ex-tract noise-based features from images with diverse semantic information. By discerning difference between these noise features, the model identifies diseased areas. Moreover, its multi-output capability mimics doctors' annotation behavior, facilitating the visualization of segmentation result consistency and ambiguity. Additionally, it quantifies output uncertainty using Generalized Energy Distance (GED), aiding interpretability and decision-making for physicians. Finally, the model integrates outputs through the Dense Conditional Random Field (DenseCRF) algorithm to refine the segmentation boundaries by considering inter-pixel correlations, which improves the accuracy and optimizes the segmentation results. We demonstrate the effectiveness of DiffSeg on the ISIC 2018 Challenge dataset, outperforming state-of-the-art U-Net-based methods.
弱监督下的医学图像分割(MIS)利用生成模型在临床诊断中至关重要。然而,分割结果的准确性常常受到监督不足和医学图像复杂性的限制。现有的模型仅提供单一输出,无法衡量不确定性。在本文中,我们介绍了DiffSeg,一种基于扩散差分的皮肤病变分割模型,它利用扩散模型原理从具有丰富语义信息的图像中提取噪声基于特征。通过鉴别这些噪声特征,模型识别出病变区域。此外,其多输出能力模仿了医生的标注行为,有助于可视化分割结果的一致性和不确定性。此外,通过使用泛化能量距离(GED)量化输出不确定性,有助于医生更好地解释和做出决策。最后,通过Dense Conditional Random Field(DenseCRF)算法将输出集成,通过考虑像素间关联来平滑分割边界,从而提高准确性和优化分割结果。我们在ISIC 2018挑战数据集上证明了DiffSeg的有效性,超越了基于U-Net的最先进方法。
https://arxiv.org/abs/2404.16474
Cell tracking remains a pivotal yet challenging task in biomedical research. The full potential of deep learning for this purpose is often untapped due to the limited availability of comprehensive and varied training data sets. In this paper, we present SynCellFactory, a generative cell video augmentation. At the heart of SynCellFactory lies the ControlNet architecture, which has been fine-tuned to synthesize cell imagery with photorealistic accuracy in style and motion patterns. This technique enables the creation of synthetic yet realistic cell videos that mirror the complexity of authentic microscopy time-lapses. Our experiments demonstrate that SynCellFactory boosts the performance of well-established deep learning models for cell tracking, particularly when original training data is sparse.
细胞追踪在生物医学研究中仍然是一个关键但具有挑战性的任务。由于深度学习在为此目的的全面且多样化的训练数据集的可用性方面往往被低估,因此深度学习在此任务上的全部潜力常常未被充分利用。在本文中,我们提出了SynCellFactory,一种生成细胞视频的增强方法。SynCellFactory的核心是ControlNet架构,该架构已通过在风格和运动模式上合成细胞图像来提高其准确性。这种技术能够创建与真实显微镜时间间隔复杂性相仿的合成细胞视频。我们的实验结果表明,SynCellFactory能够显著提高已有的深度学习模型在细胞追踪方面的性能,特别是当原始训练数据稀疏时。
https://arxiv.org/abs/2404.16421
In the realm of Medical Visual Language Models (Med-VLMs), the quest for universal efficient fine-tuning mechanisms remains paramount, especially given researchers in interdisciplinary fields are often extremely short of training resources, yet largely unexplored. Given the unique challenges in the medical domain, such as limited data scope and significant domain-specific requirements, evaluating and adapting Parameter-Efficient Fine-Tuning (PEFT) methods specifically for Med-VLMs is essential. Most of the current PEFT methods on Med-VLMs have yet to be comprehensively investigated but mainly focus on adding some components to the model's structure or input. However, fine-tuning intrinsic model components often yields better generality and consistency, and its impact on the ultimate performance of Med-VLMs has been widely overlooked and remains understudied. In this paper, we endeavour to explore an alternative to traditional PEFT methods, especially the impact of fine-tuning LayerNorm layers, FFNs and Attention layers on the Med-VLMs. Our comprehensive studies span both small-scale and large-scale Med-VLMs, evaluating their performance under various fine-tuning paradigms across tasks such as Medical Visual Question Answering and Medical Imaging Report Generation. The findings reveal unique insights into the effects of intrinsic parameter fine-tuning methods on fine-tuning Med-VLMs to downstream tasks and expose fine-tuning solely the LayerNorm layers not only surpasses the efficiency of traditional PEFT methods but also retains the model's accuracy and generalization capabilities across a spectrum of medical downstream tasks. The experiments show LayerNorm fine-tuning's superior adaptability and scalability, particularly in the context of large-scale Med-VLMs.
在医疗可视语言模型(Med-VLMs)领域,寻求通用的有效微调方法仍然是至关重要的,尤其是在跨学科领域的研究者通常缺乏训练资源的情况下,而这一领域也往往被广泛探索。考虑到医疗领域的独特挑战,如有限的数据范围和显著的领域特定要求,专门为Med-VLMs评估和适应参数高效的微调(PEFT)方法至关重要。目前,大多数关于Med-VLMs的PEFT方法尚未进行全面的调查,但主要集中在向模型结构或输入中添加一些组件。然而,微调固有模型组件通常会产生更好的泛化能力和一致性,对其在Med-VLMs最终性能的影响却被广泛忽视和未研究。在本文中,我们力求探讨一种不同于传统PEFT方法的新型选择,特别是对LayerNorm层、FFN和Attention层的微调对Med-VLMs的影响。我们全面的研究跨越了小规模和大型Med-VLMs,在各种任务上评估它们在不同微调范式下的性能,例如医疗视觉问答和医疗图像报告生成。研究结果揭示了在微调固有参数方法对微调Med-VLMs的影响以及仅对LayerNorm层进行微调不仅超越了传统PEFT方法的效率,而且保留了模型的准确性和泛化能力。实验表明,LayerNorm微调的适应性和可扩展性在大型Med-VLMs方面具有优势。
https://arxiv.org/abs/2404.16385
The use of multimodal data in assisted diagnosis and segmentation has emerged as a prominent area of interest in current research. However, one of the primary challenges is how to effectively fuse multimodal features. Most of the current approaches focus on the integration of multimodal features while ignoring the correlation and consistency between different modal features, leading to the inclusion of potentially irrelevant information. To address this issue, we introduce an innovative Multimodal Information Cross Transformer (MicFormer), which employs a dual-stream architecture to simultaneously extract features from each modality. Leveraging the Cross Transformer, it queries features from one modality and retrieves corresponding responses from another, facilitating effective communication between bimodal features. Additionally, we incorporate a deformable Transformer architecture to expand the search space. We conducted experiments on the MM-WHS dataset, and in the CT-MRI multimodal image segmentation task, we successfully improved the whole-heart segmentation DICE score to 85.57 and MIoU to 75.51. Compared to other multimodal segmentation techniques, our method outperforms by margins of 2.83 and 4.23, respectively. This demonstrates the efficacy of MicFormer in integrating relevant information between different modalities in multimodal tasks. These findings hold significant implications for multimodal image tasks, and we believe that MicFormer possesses extensive potential for broader applications across various domains. Access to our method is available at this https URL
在当前的研究中,多模态数据在辅助诊断和分割中的应用已成为一个突出的研究领域。然而,一个主要挑战是如何有效地融合多模态特征。目前的大多数方法都关注于多模态特征的整合,而忽略了不同模态特征之间的相关性和一致性,导致包含可能无关的信息。为解决这个问题,我们引入了一种创新的多模态信息交叉变换(MicFormer)方法,它采用双流架构同时提取每个模态的特征。利用交叉变换,它从一个模态提取特征并从另一个模态检索相应的响应,促进不同模态特征之间的有效沟通。此外,我们还引入了一个可变的Transformer架构来扩展搜索空间。我们在MM-WHS数据集上进行了实验,并在CT-MRI多模态图像分割任务中成功将整体心分割DICE得分提高至85.57,MIoU至75.51。与其他多模态分割技术相比,我们的方法在优势上明显超过2.83和4.23。这些发现对于多模态图像任务具有重要的意义,我们相信MicFormer在各种领域的广泛应用具有极大的潜力。您可以通过以下链接访问我们的方法:https://www. researchgate.net/publication/333003621_Multimodal_Information_Cross_Transformer
https://arxiv.org/abs/2404.16371
Automatic retinal layer segmentation with medical images, such as optical coherence tomography (OCT) images, serves as an important tool for diagnosing ophthalmic diseases. However, it is challenging to achieve accurate segmentation due to low contrast and blood flow noises presented in the images. In addition, the algorithm should be light-weight to be deployed for practical clinical applications. Therefore, it is desired to design a light-weight network with high performance for retinal layer segmentation. In this paper, we propose LightReSeg for retinal layer segmentation which can be applied to OCT images. Specifically, our approach follows an encoder-decoder structure, where the encoder part employs multi-scale feature extraction and a Transformer block for fully exploiting the semantic information of feature maps at all scales and making the features have better global reasoning capabilities, while the decoder part, we design a multi-scale asymmetric attention (MAA) module for preserving the semantic information at each encoder scale. The experiments show that our approach achieves a better segmentation performance compared to the current state-of-the-art method TransUnet with 105.7M parameters on both our collected dataset and two other public datasets, with only 3.3M parameters.
自动眼层分割医疗图像,如光学共焦断层扫描(OCT)图像,是诊断眼部疾病的重要工具。然而,由于图像中呈现低对比度和血流噪声,实现准确分割具有挑战性。此外,算法应该轻便,以便在实际临床应用中部署。因此,我们希望设计一个轻量高性能的眼层分割网络。在本文中,我们提出了LightReSeg用于眼层分割,可以应用于OCT图像。具体来说,我们的方法遵循编码器-解码器结构,其中编码器部分采用多尺度特征提取和Transformer模块,以充分利用所有尺度下的特征图的语义信息,并使特征具有更好的全局推理能力,而解码器部分,我们设计了一个多尺度非对称注意力(MAA)模块,用于保留每个编码器尺度下的语义信息。实验证明,我们的方法在两个公开数据集上的性能优于当前最先进的方法TransUnet,具有更少的参数,分别为105.7M和2M。
https://arxiv.org/abs/2404.16346
Despite the remarkable success of deep learning in medical imaging analysis, medical image segmentation remains challenging due to the scarcity of high-quality labeled images for supervision. Further, the significant domain gap between natural and medical images in general and ultrasound images in particular hinders fine-tuning models trained on natural images to the task at hand. In this work, we address the performance degradation of segmentation models in low-data regimes and propose a prompt-less segmentation method harnessing the ability of segmentation foundation models to segment abstract shapes. We do that via our novel prompt point generation algorithm which uses coarse semantic segmentation masks as input and a zero-shot prompt-able foundation model as an optimization target. We demonstrate our method on a segmentation findings task (pathologic anomalies) in ultrasound images. Our method's advantages are brought to light in varying degrees of low-data regime experiments on a small-scale musculoskeletal ultrasound images dataset, yielding a larger performance gain as the training set size decreases.
尽管在医学影像分析中深度学习的成功已经让人印象深刻,但由于高质量 labeled 图像的稀缺性,医学图像分割仍然具有挑战性。此外,自然图像和医学图像以及超声图像之间显著的领域差距会阻碍将基于自然图像训练的模型用于当前任务的微调。在这项工作中,我们解决了在低数据 regime 下分割模型的性能下降问题,并提出了一个无需提示的分割方法,利用分割基础模型的能力对抽象形状进行分割。我们通过使用粗粒度语义分割掩码作为输入和零散提示可优化目标来实现这一目标。我们在超声图像数据集上展示了我们的方法。在小的多关节超声图像数据集上进行低数据 regime 实验,各种低数据 regime 实验都表明,随着训练集大小的减小,性能提高。
https://arxiv.org/abs/2404.16325
Prompt leakage in large language models (LLMs) poses a significant security and privacy threat, particularly in retrieval-augmented generation (RAG) systems. However, leakage in multi-turn LLM interactions along with mitigation strategies has not been studied in a standardized manner. This paper investigates LLM vulnerabilities against prompt leakage across 4 diverse domains and 10 closed- and open-source LLMs. Our unique multi-turn threat model leverages the LLM's sycophancy effect and our analysis dissects task instruction and knowledge leakage in the LLM response. In a multi-turn setting, our threat model elevates the average attack success rate (ASR) to 86.2%, including a 99% leakage with GPT-4 and claude-1.3. We find that some black-box LLMs like Gemini show variable susceptibility to leakage across domains - they are more likely to leak contextual knowledge in the news domain compared to the medical domain. Our experiments measure specific effects of 6 black-box defense strategies, including a query-rewriter in the RAG scenario. Our proposed multi-tier combination of defenses still has an ASR of 5.3% for black-box LLMs, indicating room for enhancement and future direction for LLM security research.
大规模语言模型(LLMs)中的提示泄露对安全和隐私构成重大威胁,尤其是在检索增强生成(RAG)系统中。然而,在多轮LLM交互中,以及缓解策略,对提示泄露的研究还没有以标准化方式进行。本文研究了4个不同领域和10个开源LLM和闭源LLM对提示泄露的漏洞。我们独特的多轮威胁模型利用了LLM的协同效应,并分析了LLM响应中的任务指令和知识泄露。在多轮设置中,我们的威胁模型将平均攻击成功率(ASR)提高至86.2%,包括GPT-4和claude-1.3的99%泄漏。我们发现,一些黑盒LLM,如Gemini,在领域之间表现出不同的泄漏倾向 - 他们在新闻领域比医疗领域更容易泄露上下文知识。我们的实验测量了6个黑盒防御策略的具体效果,包括在RAG场景中的查询重写器。我们提出的多层防御组合对黑盒LLM的ASR为5.3%,表明还有提高的空间和未来LLM安全研究的发展方向。
https://arxiv.org/abs/2404.16251
Objectives: This study aims to systematically review the literature on the computational processing of the language of pain, whether generated by patients or physicians, identifying current trends and challenges. Methods: Following the PRISMA guidelines, a comprehensive literature search was conducted to select relevant studies on the computational processing of the language of pain and answer pre-defined research questions. Data extraction and synthesis were performed to categorize selected studies according to their primary purpose and outcome, patient and pain population, textual data, computational methodology, and outcome targets. Results: Physician-generated language of pain, specifically from clinical notes, was the most used data. Tasks included patient diagnosis and triaging, identification of pain mentions, treatment response prediction, biomedical entity extraction, correlation of linguistic features with clinical states, and lexico-semantic analysis of pain narratives. Only one study included previous linguistic knowledge on pain utterances in their experimental setup. Most studies targeted their outcomes for physicians, either directly as clinical tools or as indirect knowledge. The least targeted stage of clinical pain care was self-management, in which patients are most involved. The least studied dimensions of pain were affective and sociocultural. Only two studies measured how physician performance on clinical tasks improved with the inclusion of the proposed algorithm. Discussion: This study found that future research should focus on analyzing patient-generated language of pain, developing patient-centered resources for self-management and patient-empowerment, exploring affective and sociocultural aspects of pain, and measuring improvements in physician performance when aided by the proposed tools.
研究目标:本研究旨在系统地回顾有关疼痛语言计算的相关文献,无论是由患者还是医生产生的,以识别当前的趋势和挑战。方法:遵循PRISMA指南,进行全面的文献搜索,以选择与疼痛语言计算相关的研究,并回答预先设定的研究问题。数据提取和合成是将所选研究根据其主要目的和结果、患者和痛苦人群、文本数据、计算方法以及结果目标进行分类的过程。结果:医生产生的疼痛语言,特别是从病历中提取的数据,是最常用的数据。任务包括患者诊断和分诊、疼痛提及的识别、治疗反应预测、生物医学实体提取、语言特征与临床状态的关联以及疼痛叙述的词汇-语义分析。只有1篇论文包括了他们在实验设置中之前对疼痛语句的语言知识。大多数研究将重点放在医生身上,无论是直接作为临床工具,还是作为间接知识。最少的针对性临床疼痛护理阶段是自我管理,其中患者最积极参与。最少的疼痛研究维度是情感和社会文化方面。只有2篇论文测量了医生在临床任务中表现随着所提出的算法的引入而改善。讨论:本研究发现,未来的研究应该集中于分析患者产生的疼痛语言,为自我管理和患者赋权开发基于患者的资源,探索疼痛的情感和社会文化方面,以及衡量医生在使用所提出的工具时的表现改善。
https://arxiv.org/abs/2404.16226
Objective: Clinical trials are essential for advancing pharmaceutical interventions, but they face a bottleneck in selecting eligible participants. Although leveraging electronic health records (EHR) for recruitment has gained popularity, the complex nature of unstructured medical texts presents challenges in efficiently identifying participants. Natural Language Processing (NLP) techniques have emerged as a solution with a recent focus on transformer models. In this study, we aimed to evaluate the performance of a prompt-based large language model for the cohort selection task from unstructured medical notes collected in the EHR. Methods: To process the medical records, we selected the most related sentences of the records to the eligibility criteria needed for the trial. The SNOMED CT concepts related to each eligibility criterion were collected. Medical records were also annotated with MedCAT based on the SNOMED CT ontology. Annotated sentences including concepts matched with the criteria-relevant terms were extracted. A prompt-based large language model (Generative Pre-trained Transformer (GPT) in this study) was then used with the extracted sentences as the training set. To assess its effectiveness, we evaluated the model's performance using the dataset from the 2018 n2c2 challenge, which aimed to classify medical records of 311 patients based on 13 eligibility criteria through NLP techniques. Results: Our proposed model showed the overall micro and macro F measures of 0.9061 and 0.8060 which were among the highest scores achieved by the experiments performed with this dataset. Conclusion: The application of a prompt-based large language model in this study to classify patients based on eligibility criteria received promising scores. Besides, we proposed a method of extractive summarization with the aid of SNOMED CT ontology that can be also applied to other medical texts.
目标:临床试验对于推动制药干预至关重要,但在选择合适参与者方面存在瓶颈。尽管利用电子病历(EHR)进行招募的做法已经受到欢迎,但非结构化医疗文本复杂的 nature 提出了有效地识别参与者的挑战。自然语言处理(NLP)技术在最近关注于Transformer模型方面成为了解决方案。在这项研究中,我们旨在评估基于提示的大型语言模型在从EHR中收集的非结构化医疗文本的队列选择任务中的性能。方法:为了处理医学记录,我们选择了与需要试验资格标准相关的最相关的句子。收集了与每个资格标准相关的SNOMED CT概念。同时,根据SNOMED CT语义数据库对医学记录进行了注释。包括与标准匹配的概念的注解句子被提取出来。然后,使用基于提示的大型语言模型(本研究中使用的是Generative Pre-trained Transformer(GPT))对提取的句子进行训练。为了评估其效果,我们使用2018 n2c2挑战的数据集来评估模型的性能,该数据集旨在根据13个资格标准对311名患者的医疗记录进行分类。结果:与该数据集上进行的实验相比,我们提出的模型在整体微和宏观F分数方面得分最高,为0.9061和0.8060,这是该数据集中实现的最高分数。结论:将提示式大型语言模型应用于根据资格标准对患者进行分类,在本研究中得到了有前景的分数。此外,我们还提出了使用SNOMED CT语义数据库的提取式总结方法,该方法也可以应用于其他医学文本。
https://arxiv.org/abs/2404.16198
Vision-language models, while effective in general domains and showing strong performance in diverse multi-modal applications like visual question-answering (VQA), struggle to maintain the same level of effectiveness in more specialized domains, e.g., medical. We propose a medical vision-language model that integrates large vision and language models adapted for the medical domain. This model goes through three stages of parameter-efficient training using three separate biomedical and radiology multi-modal visual and text datasets. The proposed model achieves state-of-the-art performance on the SLAKE 1.0 medical VQA (MedVQA) dataset with an overall accuracy of 87.5% and demonstrates strong performance on another MedVQA dataset, VQA-RAD, achieving an overall accuracy of 73.2%.
视觉语言模型在一般领域通常都表现出良好的效果,并且在多样化的多模态应用中如视觉问答(VQA)中表现出强大的性能。然而,在更 specialized的领域,如医学领域,这些模型很难保持同样的效果。为了克服这一问题,我们提出了一个医学视觉语言模型,该模型整合了适用于医学领域的较大视觉和语言模型。该模型通过使用三个分开的生物医学和放射学多模态视觉和文本数据集进行参数高效的训练,分别训练三个阶段。所提出的模型在SLAKE 1.0医疗VQA(MedVQA)数据集上实现了最先进的性能, overall accuracy 达到了87.5%,同时在另一个MedVQA数据集VQA-RAD上表现出强大的性能, overall accuracy 达到了73.2%。
https://arxiv.org/abs/2404.16192
The recent prevalence of publicly accessible, large medical imaging datasets has led to a proliferation of artificial intelligence (AI) models for cardiovascular image classification and analysis. At the same time, the potentially significant impacts of these models have motivated the development of a range of explainable AI (XAI) methods that aim to explain model predictions given certain image inputs. However, many of these methods are not developed or evaluated with domain experts, and explanations are not contextualized in terms of medical expertise or domain knowledge. In this paper, we propose a novel framework and python library, MiMICRI, that provides domain-centered counterfactual explanations of cardiovascular image classification models. MiMICRI helps users interactively select and replace segments of medical images that correspond to morphological structures. From the counterfactuals generated, users can then assess the influence of each segment on model predictions, and validate the model against known medical facts. We evaluate this library with two medical experts. Our evaluation demonstrates that a domain-centered XAI approach can enhance the interpretability of model explanations, and help experts reason about models in terms of relevant domain knowledge. However, concerns were also surfaced about the clinical plausibility of the counterfactuals generated. We conclude with a discussion on the generalizability and trustworthiness of the MiMICRI framework, as well as the implications of our findings on the development of domain-centered XAI methods for model interpretability in healthcare contexts.
近年来,公开可获取的大型医疗影像数据集的普及导致了许多心血管图像分类和分析的人工智能(AI)模型的出现。与此同时,这些模型的潜在影响也促使开发了一系列可解释AI(XAI)方法,旨在解释给定图像输入的模型预测。然而,许多这些方法都没有经过领域专家的开发或评估,并且解释没有针对医疗专业知识或领域知识进行contextual化。在本文中,我们提出了一个新颖的框架和Python库,MiMICRI,为心血管图像分类模型的领域中心反事实解释提供支持。MiMICRI使用户可以交互式选择和替换医学图像中与形态结构对应的区域。从反事实产生的结果中,用户可以 then评估每个片段对模型预测的影响,并验证模型是否符合已知医疗事实。我们对这个库进行了两个医疗专家的评估。我们的评估表明,以领域为中心的XAI方法可以增强模型解释的可解释性,并帮助专家在相关领域知识的基础上对模型进行推理。然而,也担忧反事实产生的临床可解释性。我们得出结论,MiMICRI框架的可解释性和可靠性,以及我们的研究结果对 healthcare 环境中模型可解释性发展的影响,都存在一定的意义。
https://arxiv.org/abs/2404.16174
Sequence modeling is a crucial area across various domains, including Natural Language Processing (NLP), speech recognition, time series forecasting, music generation, and bioinformatics. Recurrent Neural Networks (RNNs) and Long Short Term Memory Networks (LSTMs) have historically dominated sequence modeling tasks like Machine Translation, Named Entity Recognition (NER), etc. However, the advancement of transformers has led to a shift in this paradigm, given their superior performance. Yet, transformers suffer from $O(N^2)$ attention complexity and challenges in handling inductive bias. Several variations have been proposed to address these issues which use spectral networks or convolutions and have performed well on a range of tasks. However, they still have difficulty in dealing with long sequences. State Space Models(SSMs) have emerged as promising alternatives for sequence modeling paradigms in this context, especially with the advent of S4 and its variants, such as S4nd, Hippo, Hyena, Diagnol State Spaces (DSS), Gated State Spaces (GSS), Linear Recurrent Unit (LRU), Liquid-S4, Mamba, etc. In this survey, we categorize the foundational SSMs based on three paradigms namely, Gating architectures, Structural architectures, and Recurrent architectures. This survey also highlights diverse applications of SSMs across domains such as vision, video, audio, speech, language (especially long sequence modeling), medical (including genomics), chemical (like drug design), recommendation systems, and time series analysis, including tabular data. Moreover, we consolidate the performance of SSMs on benchmark datasets like Long Range Arena (LRA), WikiText, Glue, Pile, ImageNet, Kinetics-400, sstv2, as well as video datasets such as Breakfast, COIN, LVU, and various time series datasets. The project page for Mamba-360 work is available on this webpage.\url{this https URL}.
序列建模是一个贯穿各种领域的关键领域,包括自然语言处理(NLP)、语音识别、时间序列预测、音乐生成和生物信息学。递归神经网络(RNNs)和长短时记忆网络(LSTMs)历史上曾统治序列建模任务,如机器翻译、命名实体识别等。然而,Transformer的进步导致了一种范式的转移,由于它们在性能上的优越表现。然而,Transformer的注意力复杂性和处理归纳偏差的能力仍然存在挑战。为解决这些问题,已经提出了几种变体,包括使用特征网络或卷积的模型,并在各种任务上表现良好。然而,它们仍然很难处理长序列。状态空间模型(SSMs)在这一背景下出现了有前景的替代方案,尤其是S4和其变体,如S4nd、Hippo、Hyena、诊断状态空间(DSS)、Gated State Spaces(GSS)和Linear Recurrent Unit(LRU)、Liquid-S4、Mamba等。在本次调查中,我们根据三种范式对基本SSMs进行了分类,即开关架构、结构架构和循环架构。本调查还强调了SSMs在各个领域的多样化应用,如视觉、视频、音频、语音、语言(特别是长序列建模)、医学(包括基因组学)、化学(如药物设计)和推荐系统,以及时间序列分析,包括表格数据。此外,我们还分析了SSMs在基准数据集,如Long Range Arena(LRA)、WikiText、Glue、Pile、ImageNet、Kinetics-400、sstv2,以及视频数据集,如Breakfast、COIN、LVU等。Mamba-360工作的项目页面可以在该网页上查看。
https://arxiv.org/abs/2404.16112
While the field of medical image analysis has undergone a transformative shift with the integration of machine learning techniques, the main challenge of these techniques is often the scarcity of large, diverse, and well-annotated datasets. Medical images vary in format, size, and other parameters and therefore require extensive preprocessing and standardization, for usage in machine learning. Addressing these challenges, we introduce the Medical Imaging Meta-Dataset (MedIMeta), a novel multi-domain, multi-task meta-dataset. MedIMeta contains 19 medical imaging datasets spanning 10 different domains and encompassing 54 distinct medical tasks, all of which are standardized to the same format and readily usable in PyTorch or other ML frameworks. We perform a technical validation of MedIMeta, demonstrating its utility through fully supervised and cross-domain few-shot learning baselines.
尽管将机器学习技术融入医学图像分析领域已经经历了一次变革性的转变,但这种技术的主要挑战通常是缺乏大型、多样化和具有良好标注的大型数据集。 医学图像在格式、大小和其他参数上有所不同,因此需要进行广泛的预处理和标准化,以便在机器学习应用程序中使用。为解决这些挑战,我们引入了医学图像元数据集(MedIMeta),这是一个新型的多领域、多任务元数据集。MedIMeta包含19个医学图像数据集,跨越10个不同的领域,涵盖54个不同的医学任务,所有这些数据集都已标准化为相同的格式,且易于在PyTorch或其他ML框架中使用。我们通过完全监督和跨域少样本学习基准对MedIMeta进行了技术验证,证明了其实用性。
https://arxiv.org/abs/2404.16000
Analyzing volumetric data with rotational invariance or equivariance is an active topic in current research. Existing deep-learning approaches utilize either group convolutional networks limited to discrete rotations or steerable convolutional networks with constrained filter structures. This work proposes a novel equivariant neural network architecture that achieves analytical Equivariance to Local Pattern Orientation on the continuous SO(3) group while allowing unconstrained trainable filters - EquiLoPO Network. Our key innovations are a group convolutional operation leveraging irreducible representations as the Fourier basis and a local activation function in the SO(3) space that provides a well-defined mapping from input to output functions, preserving equivariance. By integrating these operations into a ResNet-style architecture, we propose a model that overcomes the limitations of prior methods. A comprehensive evaluation on diverse 3D medical imaging datasets from MedMNIST3D demonstrates the effectiveness of our approach, which consistently outperforms state of the art. This work suggests the benefits of true rotational equivariance on SO(3) and flexible unconstrained filters enabled by the local activation function, providing a flexible framework for equivariant deep learning on volumetric data with potential applications across domains. Our code is publicly available at \url{this https URL}.
分析体积数据具有旋转不变性或等价性是当前研究的一个活跃主题。现有的深度学习方法要么是有限离散旋转的组卷积网络,要么是具有约束滤波器结构的可调节卷积网络。本文提出了一种新颖的等价神经网络架构,可以在连续SO(3)组上实现对局部模式方向的分析等价性,同时允许无约束的训练滤波器 - EquiLoPO网络。我们的关键创新点是一个利用不可约表示作为傅里叶基的组卷积操作,以及SO(3)空间中提供输入到输出函数的良好定义的局部激活函数。通过将这些操作整合到ResNet风格的架构中,我们提出了一个克服了先前方法局限性的模型。对MedMNIST3D等多样3D医疗成像数据集的全面评估表明,我们的方法的有效性得到了充分证明,该方法 consistently超越了最先进的技术水平。这项工作揭示了真旋转等价性对SO(3)的益处以及由局部激活函数实现的可伸缩和不约束滤波器,为在体积数据上实现等价深度学习提供了灵活的框架,具有广泛的应用前景。我们的代码公开可用,通过点击以下链接访问:https:// this https URL。
https://arxiv.org/abs/2404.15979
State space models (SSMs) with selection mechanisms and hardware-aware architectures, namely Mamba, have recently demonstrated significant promise in long-sequence modeling. Since the self-attention mechanism in transformers has quadratic complexity with image size and increasing computational demands, the researchers are now exploring how to adapt Mamba for computer vision tasks. This paper is the first comprehensive survey aiming to provide an in-depth analysis of Mamba models in the field of computer vision. It begins by exploring the foundational concepts contributing to Mamba's success, including the state space model framework, selection mechanisms, and hardware-aware design. Next, we review these vision mamba models by categorizing them into foundational ones and enhancing them with techniques such as convolution, recurrence, and attention to improve their sophistication. We further delve into the widespread applications of Mamba in vision tasks, which include their use as a backbone in various levels of vision processing. This encompasses general visual tasks, Medical visual tasks (e.g., 2D / 3D segmentation, classification, and image registration, etc.), and Remote Sensing visual tasks. We specially introduce general visual tasks from two levels: High/Mid-level vision (e.g., Object detection, Segmentation, Video classification, etc.) and Low-level vision (e.g., Image super-resolution, Image restoration, Visual generation, etc.). We hope this endeavor will spark additional interest within the community to address current challenges and further apply Mamba models in computer vision.
带有选择机制和硬件感知架构的状态空间模型(SSMs),如Mamba,在长序列建模方面最近取得了显著的进展。由于Transformer中自注意力机制的复杂性随着图像尺寸的增加而增加,计算机视觉任务的计算需求也在增加,因此研究人员现在正在探索如何将Mamba适应计算机视觉任务。本文是旨在为计算机视觉领域提供对Mamba模型的深入分析的第一篇全面调查。文章首先探讨了导致Mamba成功的基本概念,包括状态空间模型框架、选择机制和硬件感知设计。接下来,我们通过分类这些视觉Mamba模型为基本模型并使用卷积、递归和注意等技术对其进行改进,来回顾这些模型。我们深入探讨了Mamba在计算机视觉任务中的广泛应用,包括在各种级别视觉处理中的作为骨干的应用。这包括一般视觉任务(如物体检测、分割、分类和图像配准等)、医学视觉任务(如2D/3D分割、分类和图像配准等)和遥感视觉任务。我们特别引入了两个层面的通用视觉任务:高/中级别视觉(如物体检测、分割、视频分类等)和低级别视觉(如图像超分辨率、图像修复、视觉生成等)。我们希望这个努力将在社区中激发更多的兴趣,以解决当前的挑战并进一步将Mamba模型应用于计算机视觉。
https://arxiv.org/abs/2404.15956