Medical imaging plays a crucial role in modern healthcare by providing non-invasive visualisation of internal structures and abnormalities, enabling early disease detection, accurate diagnosis, and treatment planning. This study aims to explore the application of deep learning models, particularly focusing on the UNet architecture and its variants, in medical image segmentation. We seek to evaluate the performance of these models across various challenging medical image segmentation tasks, addressing issues such as image normalization, resizing, architecture choices, loss function design, and hyperparameter tuning. The findings reveal that the standard UNet, when extended with a deep network layer, is a proficient medical image segmentation model, while the Res-UNet and Attention Res-UNet architectures demonstrate smoother convergence and superior performance, particularly when handling fine image details. The study also addresses the challenge of high class imbalance through careful preprocessing and loss function definitions. We anticipate that the results of this study will provide useful insights for researchers seeking to apply these models to new medical imaging problems and offer guidance and best practices for their implementation.
This paper introduces a novel one-stage end-to-end detector specifically designed to detect small lesions in medical images. Precise localization of small lesions presents challenges due to their appearance and the diverse contextual backgrounds in which they are found. To address this, our approach introduces a new type of pixel-based anchor that dynamically moves towards the targeted lesion for detection. We refer to this new architecture as GravityNet, and the novel anchors as gravity points since they appear to be "attracted" by the lesions. We conducted experiments on two well-established medical problems involving small lesions to evaluate the performance of the proposed approach: microcalcifications detection in digital mammograms and microaneurysms detection in digital fundus images. Our method demonstrates promising results in effectively detecting small lesions in these medical imaging tasks.
This study presents FP-PET, a comprehensive approach to medical image segmentation with a focus on CT and PET images. Utilizing a dataset from the AutoPet2023 Challenge, the research employs a variety of machine learning models, including STUNet-large, SwinUNETR, and VNet, to achieve state-of-the-art segmentation performance. The paper introduces an aggregated score that combines multiple evaluation metrics such as Dice score, false positive volume (FPV), and false negative volume (FNV) to provide a holistic measure of model effectiveness. The study also discusses the computational challenges and solutions related to model training, which was conducted on high-performance GPUs. Preprocessing and postprocessing techniques, including gaussian weighting schemes and morphological operations, are explored to further refine the segmentation output. The research offers valuable insights into the challenges and solutions for advanced medical image segmentation.
本研究介绍了FP-PET,一种全面的方法,以CT和PET图像为主要目标,采用机器学习模型,包括STUNet-large、SwinuneTR和VNet,实现最先进的分割性能。论文介绍了一种聚合得分,将多个评估指标(如Dice得分、假阳性体积(FPV)和假阴性体积(FNV))综合起来,以提供模型 effectiveness 的全局衡量。此外,研究还讨论了模型训练相关的计算挑战和解决方案,该解决方案是在高性能GPU上进行的。预处理和后处理技术,包括高斯加权Scheme和形态操作,用于进一步优化分割输出。研究提供了关于高级医学图像分割挑战和解决方案的宝贵见解。
During the COVID-19 pandemic, medical imaging techniques like computed tomography (CT) scans have demonstrated effectiveness in combating the rapid spread of the virus. Therefore, it is crucial to conduct research on computerized models for the detection of COVID-19 using CT imaging. A novel processing method has been developed, utilizing radiomic features, to assist in the CT-based diagnosis of COVID-19. Given the lower specificity of traditional features in distinguishing between different causes of pulmonary diseases, the objective of this study is to develop a CT-based radiomics framework for the differentiation of COVID-19 from other lung diseases. The model is designed to focus on outlining COVID-19 lesions, as traditional features often lack specificity in this aspect. The model categorizes images into three classes: COVID-19, non-COVID-19, or normal. It employs enhancement auto-segmentation principles using intensity dark channel prior (IDCP) and deep neural networks (ALS-IDCP-DNN) within a defined range of analysis thresholds. A publicly available dataset comprising COVID-19, normal, and non-COVID-19 classes was utilized to validate the proposed model's effectiveness. The best performing classification model, Residual Neural Network with 50 layers (Resnet-50), attained an average accuracy, precision, recall, and F1-score of 98.8%, 99%, 98%, and 98% respectively. These results demonstrate the capability of our model to accurately classify COVID-19 images, which could aid radiologists in diagnosing suspected COVID-19 patients. Furthermore, our model's performance surpasses that of more than 10 current state-of-the-art studies conducted on the same dataset.
Following the great success of various deep learning methods in image and object classification, the biomedical image processing society is also overwhelmed with their applications to various automatic diagnosis cases. Unfortunately, most of the deep learning-based classification attempts in the literature solely focus on the aim of extreme accuracy scores, without considering interpretability, or patient-wise separation of training and test data. For example, most lung nodule classification papers using deep learning randomly shuffle data and split it into training, validation, and test sets, causing certain images from the CT scan of a person to be in the training set, while other images of the exact same person to be in the validation or testing image sets. This can result in reporting misleading accuracy rates and the learning of irrelevant features, ultimately reducing the real-life usability of these models. When the deep neural networks trained on the traditional, unfair data shuffling method are challenged with new patient images, it is observed that the trained models perform poorly. In contrast, deep neural networks trained with strict patient-level separation maintain their accuracy rates even when new patient images are tested. Heat-map visualizations of the activations of the deep neural networks trained with strict patient-level separation indicate a higher degree of focus on the relevant nodules. We argue that the research question posed in the title has a positive answer only if the deep neural networks are trained with images of patients that are strictly isolated from the validation and testing patient sets.
Generative Artificial Intelligence is set to revolutionize healthcare delivery by transforming traditional patient care into a more personalized, efficient, and proactive process. Chatbots, serving as interactive conversational models, will probably drive this patient-centered transformation in healthcare. Through the provision of various services, including diagnosis, personalized lifestyle recommendations, and mental health support, the objective is to substantially augment patient health outcomes, all the while mitigating the workload burden on healthcare providers. The life-critical nature of healthcare applications necessitates establishing a unified and comprehensive set of evaluation metrics for conversational models. Existing evaluation metrics proposed for various generic large language models (LLMs) demonstrate a lack of comprehension regarding medical and health concepts and their significance in promoting patients' well-being. Moreover, these metrics neglect pivotal user-centered aspects, including trust-building, ethics, personalization, empathy, user comprehension, and emotional support. The purpose of this paper is to explore state-of-the-art LLM-based evaluation metrics that are specifically applicable to the assessment of interactive conversational models in healthcare. Subsequently, we present an comprehensive set of evaluation metrics designed to thoroughly assess the performance of healthcare chatbots from an end-user perspective. These metrics encompass an evaluation of language processing abilities, impact on real-world clinical tasks, and effectiveness in user-interactive conversations. Finally, we engage in a discussion concerning the challenges associated with defining and implementing these metrics, with particular emphasis on confounding factors such as the target audience, evaluation methods, and prompt techniques involved in the evaluation process.
生成人工智能计划通过将传统病人护理转变为更个性化、高效和主动的过程来革命性地改变医疗保健交付。聊天机器人将成为交互对话模型,可能会推动这种病人为中心的医疗保健变革。通过提供各种服务,包括诊断、个性化生活方式建议和心理健康支持,的目标是实质性增加患者的健康成果,同时减轻医疗保健提供者的工作负担。医疗保健应用程序的生命重要性迫使建立统一和全面的评估 metrics 对对话模型。针对各种通用大型语言模型(LLM)提出的现有评估 metrics 表明对医学和健康概念的理解不足,以及它们在促进患者福祉方面的重要性。此外,这些 metrics 忽略了关键用户中心方面,包括建立信任、伦理、个性化、同理心、用户理解和情感支持。本文的目的是探索适用于医疗保健交互对话模型评估的最新 LLM based 评估 metrics。随后,我们介绍了一个全面 set 旨在从用户的角度全面评估医疗保健聊天机器人的性能。这些 metrics 包括对语言处理能力的评估、对现实世界临床任务的影响以及用户交互对话的有效性。最后,我们参与讨论与定义和实施这些 metrics 相关的挑战,特别注重令人困惑的因素,例如目标受众、评估方法以及评估过程中涉及的快速技巧。
Large language models (LLMs) have demonstrated dominating performance in many NLP tasks, especially on generative tasks. However, they often fall short in some information extraction tasks, particularly those requiring domain-specific knowledge, such as Biomedical Named Entity Recognition (NER). In this paper, inspired by Chain-of-thought, we leverage the LLM to solve the Biomedical NER step-by-step: break down the NER task into entity span extraction and entity type determination. Additionally, for entity type determination, we inject entity knowledge to address the problem that LLM's lack of domain knowledge when predicting entity category. Experimental results show a significant improvement in our two-step BioNER approach compared to previous few-shot LLM baseline. Additionally, the incorporation of external knowledge significantly enhances entity category determination performance.
Rapid and accurate identification of Venous thromboembolism (VTE), a severe cardiovascular condition including deep vein thrombosis (DVT) and pulmonary embolism (PE), is important for effective treatment. Leveraging Natural Language Processing (NLP) on radiology reports, automated methods have shown promising advancements in identifying VTE events from retrospective data cohorts or aiding clinical experts in identifying VTE events from radiology reports. However, effectively training Deep Learning (DL) and the NLP models is challenging due to limited labeled medical text data, the complexity and heterogeneity of radiology reports, and data imbalance. This study proposes novel method combinations of DL methods, along with data augmentation, adaptive pre-trained NLP model selection, and a clinical expert NLP rule-based classifier, to improve the accuracy of VTE identification in unstructured (free-text) radiology reports. Our experimental results demonstrate the model's efficacy, achieving an impressive 97\% accuracy and 97\% F1 score in predicting DVT, and an outstanding 98.3\% accuracy and 98.4\% F1 score in predicting PE. These findings emphasize the model's robustness and its potential to significantly contribute to VTE research.
快速和准确地识别Venous thromboembolism (VTE),这是一种严重的心血管疾病,包括深静脉破裂(DVT)和 Pulmonary embolism (PE),对于有效的治疗非常重要。利用自然语言处理(NLP)技术在影像学报告上的应用,自动化方法在从回顾性数据群中识别VTE事件或帮助临床专家从影像学报告中识别VTE事件方面取得了令人鼓舞的进展。然而,有效地训练深度学习(DL)和NLP模型由于标签医学文本数据有限、影像学报告的复杂性和多样性以及数据不平衡而具有挑战性。本研究提出了DL方法的新型组合方法,并添加数据增强、自适应预训练NLP模型选择和临床专家的NLP规则基于分类器,以在无结构的(自由文本)影像学报告中提高VTE识别的准确性。我们的实验结果证明了模型的有效性,在预测DVT方面实现了令人印象深刻的97%准确性和97%F1得分,在预测PE方面实现了卓越的98.3%准确性和98.4%F1得分。这些发现强调了模型的稳健性和它可能对VTE研究做出显著贡献的潜力。
Biomedical image datasets can be imbalanced due to the rarity of targeted diseases. Generative Adversarial Networks play a key role in addressing this imbalance by enabling the generation of synthetic images to augment datasets. It is important to generate synthetic images that incorporate a diverse range of features to accurately represent the distribution of features present in the training imagery. Furthermore, the absence of diverse features in synthetic images can degrade the performance of machine learning classifiers. The mode collapse problem impacts Generative Adversarial Networks' capacity to generate diversified images. Mode collapse comes in two varieties: intra-class and inter-class. In this paper, both varieties of the mode collapse problem are investigated, and their subsequent impact on the diversity of synthetic X-ray images is evaluated. This work contributes an empirical demonstration of the benefits of integrating the adaptive input-image normalization with the Deep Convolutional GAN and Auxiliary Classifier GAN to alleviate the mode collapse problems. Synthetically generated images are utilized for data augmentation and training a Vision Transformer model. The classification performance of the model is evaluated using accuracy, recall, and precision scores. Results demonstrate that the DCGAN and the ACGAN with adaptive input-image normalization outperform the DCGAN and ACGAN with un-normalized X-ray images as evidenced by the superior diversity scores and classification scores.
The increase in the availability of online videos has transformed the way we access information and knowledge. A growing number of individuals now prefer instructional videos as they offer a series of step-by-step procedures to accomplish particular tasks. The instructional videos from the medical domain may provide the best possible visual answers to first aid, medical emergency, and medical education questions. Toward this, this paper is focused on answering health-related questions asked by the public by providing visual answers from medical videos. The scarcity of large-scale datasets in the medical domain is a key challenge that hinders the development of applications that can help the public with their health-related questions. To address this issue, we first proposed a pipelined approach to create two large-scale datasets: HealthVidQA-CRF and HealthVidQA-Prompt. Later, we proposed monomodal and multimodal approaches that can effectively provide visual answers from medical videos to natural language questions. We conducted a comprehensive analysis of the results, focusing on the impact of the created datasets on model training and the significance of visual features in enhancing the performance of the monomodal and multi-modal approaches. Our findings suggest that these datasets have the potential to enhance the performance of medical visual answer localization tasks and provide a promising future direction to further enhance the performance by using pre-trained language-vision models.
Brain tumors are collections of abnormal cells that can develop into masses or clusters. Because they have the potential to infiltrate other tissues, they pose a risk to the patient. The main imaging technique used, MRI, may be able to identify a brain tumor with accuracy. The fast development of Deep Learning methods for use in computer vision applications has been facilitated by a vast amount of training data and improvements in model construction that offer better approximations in a supervised setting. The need for these approaches has been the main driver of this expansion. Deep learning methods have shown promise in improving the precision of brain tumor detection and classification using magnetic resonance imaging (MRI). The study on the use of deep learning techniques, especially ResNet50, for brain tumor identification is presented in this abstract. As a result, this study investigates the possibility of automating the detection procedure using deep learning techniques. In this study, I utilized five transfer learning models which are VGG16, VGG19, DenseNet121, ResNet50 and YOLO V4 where ResNet50 provide the best or highest accuracy 99.54%. The goal of the study is to guide researchers and medical professionals toward powerful brain tumor detecting systems by employing deep learning approaches by way of this evaluation and analysis.
脑瘤是异常细胞组成的集合,可能形成群体或聚集。因为它们有进入其他组织的潜力,它们对患者构成风险。使用MRI作为主要成像技术可能会准确识别脑瘤。深度学习方法在计算机视觉应用中的快速发展促进了大量训练数据和模型构建改进,在监督环境下提供更好的近似。这些方法的需求一直是这种扩展的主要驱动力。深度学习方法已经表现出 promise 来改善使用磁共振成像(MRI)进行脑瘤检测和分类的精度。本文介绍了使用深度学习技术特别是 ResNet50 进行脑瘤识别的研究。因此,本文研究了使用深度学习技术自动化检测方法的可能性。在本研究中,我使用了五个转移学习模型,它们是 VGG16、VGG19、DenseNet121、ResNet50 和 YOLO V4,ResNet50 提供最高的或最高水平的准确率 99.54%。研究的目标是通过使用深度学习方法来指导研究人员和医疗保健专业人员开发强大的脑瘤检测系统。
Tumor segmentation in medical imaging is crucial and relies on precise delineation. Fluorodeoxyglucose Positron-Emission Tomography (FDG-PET) is widely used in clinical practice to detect metabolically active tumors. However, FDG-PET scans may misinterpret irregular glucose consumption in healthy or benign tissues as cancer. Combining PET with Computed Tomography (CT) can enhance tumor segmentation by integrating metabolic and anatomic information. FDG-PET/CT scans are pivotal for cancer staging and reassessment, utilizing radiolabeled fluorodeoxyglucose to highlight metabolically active regions. Accurately distinguishing tumor-specific uptake from physiological uptake in normal tissues is a challenging aspect of precise tumor segmentation. The AutoPET challenge addresses this by providing a dataset of 1014 FDG-PET/CT studies, encouraging advancements in accurate tumor segmentation and analysis within the FDG-PET/CT domain. Code: this https URL
医学影像中的肿瘤分割是至关重要的,并且需要精确的分割。氟代葡萄糖正电子发射断层扫描(FDG-PET)在临床实践中广泛应用来检测代谢活跃的肿瘤。然而,FDG-PET扫描可能会将正常或良性组织中的不规则葡萄糖摄取误解为肿瘤。将PET与计算机断层扫描(CT)结合可以加强肿瘤分割,通过整合代谢和解剖信息。FDG-PET/CT扫描对于肿瘤分期和重新评估至关重要,使用放射性同位素氟代葡萄糖来强调代谢活跃的区域。准确区分肿瘤特异性摄取和正常组织中的生理摄取是精确肿瘤分割的一个挑战性方面。自动PET挑战解决这个问题,提供了1014个FDG-PET/CT研究的数据库,鼓励在FDG-PET/CT领域的精确肿瘤分割和分析方面的技术进步。代码: this https URL
Encoder-decoder networks become a popular choice for various medical image segmentation tasks. When they are trained with a standard loss function, these networks are not explicitly enforced to preserve the shape integrity of an object in an image. However, this ability of the network is important to obtain more accurate results, especially when there is a low-contrast difference between the object and its surroundings. In response to this issue, this work introduces a new shape-aware loss function, which we name FourierLoss. This loss function relies on quantifying the shape dissimilarity between the ground truth and the predicted segmentation maps through the Fourier descriptors calculated on their objects, and penalizing this dissimilarity in network training. Different than the previous studies, FourierLoss offers an adaptive loss function with trainable hyperparameters that control the importance of the level of the shape details that the network is enforced to learn in the training process. This control is achieved by the proposed adaptive loss update mechanism, which end-to-end learns the hyperparameters simultaneously with the network weights by backpropagation. As a result of using this mechanism, the network can dynamically change its attention from learning the general outline of an object to learning the details of its contour points, or vice versa, in different training epochs. Working on 2879 computed tomography images of 93 subjects, our experiments revealed that the proposed adaptive shape-aware loss function led to statistically significantly better results for liver segmentation, compared to its counterparts.
编码-解码网络成为各种医学图像分割任务的流行选择。在训练时使用标准损失函数时,这些网络并不明确强制维护图像中物体的形状完整性。然而,该网络的能力对于获得更准确的结果非常重要,特别是当物体和其周围环境的对比度较低时。为了解决这一问题,这项工作提出了一种新的形状 aware loss function,我们称之为傅里叶 Loss。该损失函数依赖于量化形状差异,通过计算它们在物体上的傅里叶描述符,并在网络训练时惩罚这种差异。与以前的研究不同,傅里叶 Loss 提供了可训练的超参数,控制网络在训练过程中被迫学习的形状细节的重要性。该控制由提出的自适应损失更新机制实现,该机制从网络的端到端学习超参数,同时使用反向传播。由于使用这种机制,网络可以从学习物体的一般轮廓动态地改变其注意力,或反之,在不同的训练 epoch 中。针对 2879 个对象的 93 个样本,我们的实验表明, proposed 自适应形状 aware loss function 导致肝脏分割结果比其竞品Statistically significantly 更好。
Multi-task learning (MTL) has shown great potential in medical image analysis, improving the generalizability of the learned features and the performance in individual tasks. However, most of the work on MTL focuses on either architecture design or gradient manipulation, while in both scenarios, features are learned in a competitive manner. In this work, we propose to formulate MTL as a multi/bi-level optimization problem, and therefore force features to learn from each task in a cooperative approach. Specifically, we update the sub-model for each task alternatively taking advantage of the learned sub-models of the other tasks. To alleviate the negative transfer problem during the optimization, we search for flat minima for the current objective function with regard to features from other tasks. To demonstrate the effectiveness of the proposed approach, we validate our method on three publicly available datasets. The proposed method shows the advantage of cooperative learning, and yields promising results when compared with the state-of-the-art MTL approaches. The code will be available online.
多任务学习(MTL)在医学图像分析中表现出巨大的潜力,可以提高学习特征的泛化能力和单个任务的表现。然而,在MTL的研究中,大多数工作都关注于架构设计或梯度操纵,而在这两种情况下,特征都是以竞争的方式学习的。在本文中,我们提议将MTL建模为多/双层次优化问题,因此必须强迫每个任务从每个任务中学习特征,采用合作的方法。具体来说,我们交替更新每个任务的核心模型,利用其他任务学习的核心模型。为了减轻优化中的消极迁移问题,我们搜索当前目标函数对其他任务特征的平坦最小值。为了证明所提出的方法的有效性,我们验证我们的方法于三个公开数据集。 proposed method表明合作学习的优势,与最新的MTL方法相比,表现出令人期望的结果。代码将在网上可用。
Biomedical research yields a wealth of information, much of which is only accessible through the literature. Consequently, literature search is an essential tool for building on prior knowledge in clinical and biomedical research. Although recent improvements in artificial intelligence have expanded functionality beyond keyword-based search, these advances may be unfamiliar to clinicians and researchers. In response, we present a survey of literature search tools tailored to both general and specific information needs in biomedicine, with the objective of helping readers efficiently fulfill their information needs. We first examine the widely used PubMed search engine, discussing recent improvements and continued challenges. We then describe literature search tools catering to five specific information needs: 1. Identifying high-quality clinical research for evidence-based medicine. 2. Retrieving gene-related information for precision medicine and genomics. 3. Searching by meaning, including natural language questions. 4. Locating related articles with literature recommendation. 5. Mining literature to discover associations between concepts such as diseases and genetic variants. Additionally, we cover practical considerations and best practices for choosing and using these tools. Finally, we provide a perspective on the future of literature search engines, considering recent breakthroughs in large language models such as ChatGPT. In summary, our survey provides a comprehensive view of biomedical literature search functionalities with 36 publicly available tools.
生物医学研究产生了大量信息,其中大部分只能通过文献获取。因此,文献搜索是临床和生物医学研究不可或缺的工具,以补充先前知识。尽管近年来人工智能技术的进步已经扩展了功能超越了关键词搜索,但这些进展可能对临床和研究人员来说陌生。因此,我们提供了一份针对医学生物学中通用和特定信息需求的文献搜索工具调查,旨在帮助读者高效满足其信息需求。我们首先探讨了广泛使用的PubMed搜索引擎,讨论了最近的改进和持续的挑战。然后我们描述了针对五个特定信息需求的文献搜索工具:1. 确定基于证据的医学研究中高质量的临床研究;2. 检索基因相关的信息,包括自然语言问题;3. 通过意义搜索进行检索,包括自然语言问题;4. 找到相关的文献推荐文章;5. 挖掘文献以发现疾病和基因变异之间的关联。此外,我们涵盖了选择和使用这些工具的实际考虑和最佳实践。最后,我们提供了对文献搜索引擎未来的看法,考虑到像ChatGPT等大型语言模型近期的突破。总之,我们的调查提供了生物医学文献搜索功能的全面视角,利用36个可用工具。
Supervised training of deep learning models for medical imaging applications requires a significant amount of labeled data. This is posing a challenge as the images are required to be annotated by medical professionals. To address this limitation, we introduce the Adaptive Locked Agnostic Network (ALAN), a concept involving self-supervised visual feature extraction using a large backbone model to produce anatomically robust semantic self-segmentation. In the ALAN methodology, this self-supervised training occurs only once on a large and diverse dataset. Due to the intuitive interpretability of the segmentation, downstream models tailored for specific tasks can be easily designed using white-box models with few parameters. This, in turn, opens up the possibility of communicating the inner workings of a model with domain experts and introducing prior knowledge into it. It also means that the downstream models become less data-hungry compared to fully supervised approaches. These characteristics make ALAN particularly well-suited for resource-scarce scenarios, such as costly clinical trials and rare diseases. In this paper, we apply the ALAN approach to three publicly available echocardiography datasets: EchoNet-Dynamic, CAMUS, and TMED-2. Our findings demonstrate that the self-supervised backbone model robustly identifies anatomical subregions of the heart in an apical four-chamber view. Building upon this, we design two downstream models, one for segmenting a target anatomical region, and a second for echocardiogram view classification.
为医学成像应用训练深度学习模型需要大量标记数据,这提出了挑战,因为图像需要由医疗专业人士标注。为了解决这个问题,我们引入了Adaptive Locked Agnostic Network(ALAN),这是一个涉及使用大型基线模型自我监督视觉特征提取的概念,以产生结构稳定的语义自我分割。在ALAN方法中,这种方法仅在一个大型、多样化的数据集上进行一次自我监督训练。由于分割的直观解释性,可以很容易地使用几个参数较少的白色盒模型设计针对特定任务的目标模型。这反过来增加了与域专家进行沟通并引入先前知识的可能性,同时也意味着目标模型相对于完全监督方法来说需要更多的数据。这些特点使ALAN特别适用于资源匮乏的情况,如昂贵的临床试验和罕见的疾病。在本文中,我们应用ALAN方法访问了三个公开的心脏超声数据集: EchoNet-Dynamic、CAMUS和Tmed-2。我们的发现表明,自我监督基线模型 robustly identifies the anatomical subregions of the heart in an apical four-chamber view。基于这一点,我们设计了两个后续的模型,一个用于分割目标解剖学区域,另一个用于心脏超声视图分类。
Contrastive learning, which is a powerful technique for learning image-level representations from unlabeled data, leads a promising direction to dealing with the dilemma between large-scale pre-training and limited labeled data. However, most existing contrastive learning strategies are designed mainly for downstream tasks of natural images, therefore they are sub-optimal and even worse than learning from scratch when directly applied to medical images whose downstream tasks are usually segmentation. In this work, we propose a novel asymmetric contrastive learning framework named JCL for medical image segmentation with self-supervised pre-training. Specifically, (1) A novel asymmetric contrastive learning strategy is proposed to pre-train both encoder and decoder simultaneously in one-stage to provide better initialization for segmentation models. (2) A multi-level contrastive loss is designed to take the correspondence among feature-level, image-level and pixel-level projections, respectively into account to make sure multi-level representations can be learned by the encoder and decoder during pre-training. (3) Experiments on multiple medical image datasets indicate our JCL framework outperforms existing SOTA contrastive learning strategies.
对比学习(Contrastive Learning)是一种从未标记数据中学习图像级表示的强大技术,提供了解决大规模预训练和少量标记数据的困境的有前途的方向。然而,目前大多数对比学习策略主要设计用于自然图像的后续任务,因此它们的优劣程度和对 medical images 的后续任务通常是分割的后续任务,直接应用于这些后续任务通常比从头开始学习更差。在本研究中,我们提出了一种名为 JCL 的新不对称对比学习框架,用于医学图像分割,并采用自监督的预训练。具体来说,(1) 我们提出了一种新不对称对比学习策略,在一步中同时预训练编码器和解码器,为分割模型提供更好的初始化。(2) 我们设计了一个多级对比损失,考虑特征级、图像级和像素级投影之间的对应关系,以确保编码器和解码器在预训练期间可以学习多级表示。(3) 对多个医学图像数据集的实验表明,我们的 JCL 框架优于现有的 SOTA 对比学习策略。
Pancreatic cancer is a lethal form of cancer that significantly contributes to cancer-related deaths worldwide. Early detection is essential to improve patient prognosis and survival rates. Despite advances in medical imaging techniques, pancreatic cancer remains a challenging disease to detect. Endoscopic ultrasound (EUS) is the most effective diagnostic tool for detecting pancreatic cancer. However, it requires expert interpretation of complex ultrasound images to complete a reliable patient scan. To obtain complete imaging of the pancreas, practitioners must learn to guide the endoscope into multiple "EUS stations" (anatomical locations), which provide different views of the pancreas. This is a difficult skill to learn, involving over 225 proctored procedures with the support of an experienced doctor. We build an AI-assisted tool that utilizes deep learning techniques to identify these stations of the stomach in real time during EUS procedures. This computer-assisted diagnostic (CAD) will help train doctors more efficiently. Historically, the challenge faced in developing such a tool has been the amount of retrospective labeling required by trained clinicians. To solve this, we developed an open-source user-friendly labeling web app that streamlines the process of annotating stations during the EUS procedure with minimal effort from the clinicians. Our research shows that employing only 43 procedures with no hyperparameter fine-tuning obtained a balanced accuracy of 90%, comparable to the current state of the art. In addition, we employ Grad-CAM, a visualization technology that provides clinicians with interpretable and explainable visualizations.
胰腺癌是一种致命的癌症,对全球癌症相关死亡做出了重要贡献。早期检测是改善患者预后和生存率的关键。尽管医学影像学技术取得了进展,胰腺癌仍然是检测困难的疾病。消化镜超生(EUS)是检测胰腺癌最有效的方法。然而,它需要专家对复杂的超声波图像进行解释,以完成可靠的患者扫描。要获得完整的胰腺成像, practitioners must learn to guide the endoscope into multiple "EUS stations" (解剖学位置), which provide different views of the pancreas. 这是一种难以学习的技能,涉及超过225项受监管程序,并获得经验丰富的医生的支持。我们建立了一个人工智能协助工具,利用深度学习技术在EUS操作期间实时识别这些胃部站点。这种计算机辅助诊断(CAD)将帮助培训医生更有效地。从历史上看,开发这种工具所面临的挑战是训练有素的医生需要的回顾性标记量。为了解决这一问题,我们开发了一款开源的易于使用的标签网页应用程序,简化了在EUS操作期间标记站点的过程,并让临床医生付出最小的努力。我们的研究表明,仅使用没有超参数微调的43项程序,即可获得与当前技术水平相当的平衡精度90%。此外,我们采用了Grad-CAM,一种可以提供可解释和可解释可视化的技术。
Disease progression simulation is a crucial area of research that has significant implications for clinical diagnosis, prognosis, and treatment. One major challenge in this field is the lack of continuous medical imaging monitoring of individual patients over time. To address this issue, we develop a novel framework termed Progressive Image Editing (PIE) that enables controlled manipulation of disease-related image features, facilitating precise and realistic disease progression simulation. Specifically, we leverage recent advancements in text-to-image generative models to simulate disease progression accurately and personalize it for each patient. We theoretically analyze the iterative refining process in our framework as a gradient descent with an exponentially decayed learning rate. To validate our framework, we conduct experiments in three medical imaging domains. Our results demonstrate the superiority of PIE over existing methods such as Stable Diffusion Walk and Style-Based Manifold Extrapolation based on CLIP score (Realism) and Disease Classification Confidence (Alignment). Our user study collected feedback from 35 veteran physicians to assess the generated progressions. Remarkably, 76.2% of the feedback agrees with the fidelity of the generated progressions. To our best knowledge, PIE is the first of its kind to generate disease progression images meeting real-world standards. It is a promising tool for medical research and clinical practice, potentially allowing healthcare providers to model disease trajectories over time, predict future treatment responses, and improve patient outcomes.
疾病进展模拟是一个关键的研究领域,它对临床诊断、预后和治疗具有重大的影响。该领域的一个主要挑战是缺乏对个体患者定期医学影像监测的时间持续存在。为了解决这个问题,我们开发了一种名为 progressive image editing (PIE) 的新框架,它能够实现控制的疾病相关图像特征的操纵,便于精确和真实的疾病进展模拟。具体来说,我们利用最近的文本到图像生成模型的进步,准确模拟疾病进展,并为每个患者个性化定制。我们从理论上分析我们的迭代改进过程,将其视为一个以指数衰减的学习率的梯度下降过程。为了验证我们的框架,我们进行了三个医学影像领域的实验。我们的实验结果证明,PIE 比现有的方法如稳定扩散漫步和基于CLIP得分(真实感)和疾病分类信心(对齐)的方法优越。我们的用户研究从35名经验丰富的医生收集了反馈,以评估生成的进展。令人惊讶地,76.2% 的反馈与生成的进展精度一致。据我们所知,PIE 是生成符合实际标准的疾病进展图像的第一种方法。它对于医学研究和临床实践是一个有前途的工具,可能允许医疗保健提供者模拟时间轨迹,预测未来的治疗反应,并改善患者的治疗效果。
The fine-grained medical action analysis task has received considerable attention from pattern recognition communities recently, but it faces the problems of data and algorithm shortage. Cardiopulmonary Resuscitation (CPR) is an essential skill in emergency treatment. Currently, the assessment of CPR skills mainly depends on dummies and trainers, leading to high training costs and low efficiency. For the first time, this paper constructs a vision-based system to complete error action recognition and skill assessment in CPR. Specifically, we define 13 types of single-error actions and 74 types of composite error actions during external cardiac compression and then develop a video dataset named CPR-Coach. By taking the CPR-Coach as a benchmark, this paper thoroughly investigates and compares the performance of existing action recognition models based on different data modalities. To solve the unavoidable Single-class Training & Multi-class Testing problem, we propose a humancognition-inspired framework named ImagineNet to improve the model's multierror recognition performance under restricted supervision. Extensive experiments verify the effectiveness of the framework. We hope this work could advance research toward fine-grained medical action analysis and skill assessment. The CPR-Coach dataset and the code of ImagineNet are publicly available on Github.