This research focuses on evaluating the non-commercial open-source large language models (LLMs) Meditron, MedAlpaca, Mistral, and Llama-2 for their efficacy in interpreting medical guidelines saved in PDF format. As a specific test scenario, we applied these models to the guidelines for hypertension in children and adolescents provided by the European Society of Cardiology (ESC). Leveraging Streamlit, a Python library, we developed a user-friendly medical document chatbot tool (MedDoc-Bot). This tool enables authorized users to upload PDF files and pose questions, generating interpretive responses from four locally stored LLMs. A pediatric expert provides a benchmark for evaluation by formulating questions and responses extracted from the ESC guidelines. The expert rates the model-generated responses based on their fidelity and relevance. Additionally, we evaluated the METEOR and chrF metric scores to assess the similarity of model responses to reference answers. Our study found that Llama-2 and Mistral performed well in metrics evaluation. However, Llama-2 was slower when dealing with text and tabular data. In our human evaluation, we observed that responses created by Mistral, Meditron, and Llama-2 exhibited reasonable fidelity and relevance. This study provides valuable insights into the strengths and limitations of LLMs for future developments in medical document interpretation. Open-Source Code: this https URL
这项研究重点评估了非商业性的开源大型语言模型(LLMs)Meditron、MedAlpaca、Mistral和Llama-2在解释保存在PDF格式中的医疗指南的有效性。作为具体测试场景,我们将这些模型应用于欧洲心脏病学会(ESC)提供的儿童和青少年高血压指南。利用Streamlit,一个Python库,我们开发了一个用户友好的医疗文件聊天机器人工具(MedDoc-Bot)。这个工具允许授权用户上传PDF文件并提出问题,从而从本地存储的四个LLM中生成解释性回答。儿科专家通过构思问题和对ESC指南的回答进行评估,为评估提供了基准。此外,我们还评估了METEOR和chrF指标分数,以评估模型回答与参考答案的相似性。我们的研究发现在指标评估方面,Llama-2和Mistral表现良好。然而,当处理文本和表格数据时,Llama-2的速度较慢。在我们的人类评估中,我们观察到由Mistral、Meditron和Llama-2生成的响应具有合理的忠实度和相关性。这项研究为未来医疗文件解释的发展提供了宝贵的见解。开源代码:此链接:<https://github.com/your-name/meddoc-bot>
https://arxiv.org/abs/2405.03359
Point cloud registration aligns 3D point clouds using spatial transformations. It is an important task in computer vision, with applications in areas such as augmented reality (AR) and medical imaging. This work explores the intersection of two research trends: the integration of AR into image-guided surgery and the use of deep learning for point cloud registration. The main objective is to evaluate the feasibility of applying deep learning-based point cloud registration methods for image-to-patient registration in augmented reality-guided surgery. We created a dataset of point clouds from medical imaging and corresponding point clouds captured with a popular AR device, the HoloLens 2. We evaluate three well-established deep learning models in registering these data pairs. While we find that some deep learning methods show promise, we show that a conventional registration pipeline still outperforms them on our challenging dataset.
点云配准利用空间变换对3D点云进行对齐。在计算机视觉领域,该任务非常重要,应用于增强现实(AR)和医学成像等领域。本文探讨了两个研究趋势的交集:将AR集成到图像引导手术和利用深度学习进行点云配准。主要目标是对AR引导手术中应用基于深度学习的点云配准方法的可行性进行评估。我们创建了一个医疗影像的点云数据集以及与HoloLens 2流行AR设备捕获的相应点云数据对。我们评估了三种经过充分训练的深度学习模型在这对数据对上的配准效果。虽然我们发现一些深度学习方法显示出潜力,但我们发现传统的配准流程在我们具有挑战性的数据集上仍然表现出更好的性能。
https://arxiv.org/abs/2405.03314
Brain disorders are a major challenge to global health, causing millions of deaths each year. Accurate diagnosis of these diseases relies heavily on advanced medical imaging techniques such as Magnetic Resonance Imaging (MRI) and Computed Tomography (CT). However, the scarcity of annotated data poses a significant challenge in deploying machine learning models for medical diagnosis. To address this limitation, deep learning techniques have shown considerable promise. Domain adaptation techniques enhance a model's ability to generalize across imaging modalities by transferring knowledge from one domain (e.g., CT images) to another (e.g., MRI images). Such cross-modality adaptation is essential to improve the ability of models to consistently generalize across different imaging modalities. This study collected relevant resources from the Kaggle website and employed the Maximum Mean Difference (MMD) method - a popular domain adaptation method - to reduce the differences between imaging domains. By combining MMD with Convolutional Neural Networks (CNNs), the accuracy and utility of the model is obviously enhanced. The excellent experimental results highlight the great potential of data-driven domain adaptation techniques to improve diagnostic accuracy and efficiency, especially in resource-limited environments. By bridging the gap between different imaging modalities, the study aims to provide clinicians with more reliable diagnostic tools.
脑部疾病对全球健康构成了重大挑战,每年导致数百万人的死亡。准确诊断这些疾病依赖于先进的医学成像技术,如磁共振成像(MRI)和计算机断层扫描(CT)。然而,缺乏标注数据使得将机器学习模型应用于医学诊断方面存在重大挑战。为解决这个问题,深度学习技术已经显示出了巨大的潜力。领域自适应技术通过将一个领域的知识传递到另一个领域(例如,CT图像)来增强模型在成像模式上的泛化能力。这种跨模式适应对提高模型在不同成像模式上的一致泛化能力至关重要。 本研究从Kaggle网站收集了相关资源,并采用了一种流行的领域自适应方法——最大均差(MMD)法来降低成像领域的差异。通过将MMD与卷积神经网络(CNN)相结合,模型的准确性和实用性明显增强。出色的实验结果强调了数据驱动的领域自适应技术在提高诊断准确性和效率方面具有巨大的潜力,特别是在资源有限的环境中。通过缩小不同成像模式之间的差距,该研究旨在为临床医生提供更加可靠的诊断工具。
https://arxiv.org/abs/2405.03235
Many clinical tasks require an understanding of specialized data, such as medical images and genomics, which is not typically found in general-purpose large multimodal models. Building upon Gemini's multimodal models, we develop several models within the new Med-Gemini family that inherit core capabilities of Gemini and are optimized for medical use via fine-tuning with 2D and 3D radiology, histopathology, ophthalmology, dermatology and genomic data. Med-Gemini-2D sets a new standard for AI-based chest X-ray (CXR) report generation based on expert evaluation, exceeding previous best results across two separate datasets by an absolute margin of 1% and 12%, where 57% and 96% of AI reports on normal cases, and 43% and 65% on abnormal cases, are evaluated as "equivalent or better" than the original radiologists' reports. We demonstrate the first ever large multimodal model-based report generation for 3D computed tomography (CT) volumes using Med-Gemini-3D, with 53% of AI reports considered clinically acceptable, although additional research is needed to meet expert radiologist reporting quality. Beyond report generation, Med-Gemini-2D surpasses the previous best performance in CXR visual question answering (VQA) and performs well in CXR classification and radiology VQA, exceeding SoTA or baselines on 17 of 20 tasks. In histopathology, ophthalmology, and dermatology image classification, Med-Gemini-2D surpasses baselines across 18 out of 20 tasks and approaches task-specific model performance. Beyond imaging, Med-Gemini-Polygenic outperforms the standard linear polygenic risk score-based approach for disease risk prediction and generalizes to genetically correlated diseases for which it has never been trained. Although further development and evaluation are necessary in the safety-critical medical domain, our results highlight the potential of Med-Gemini across a wide range of medical tasks.
许多临床任务需要对专业数据的理解,比如医学图像和基因组数据,这在通用大型多模态模型中通常不会存在。在Gemini的多模态模型的基础上,我们开发了几种新Med-Gemini家族中的模型,通过2D和3D放射学、病理学、眼科、皮肤病和基因组数据进行微调,以优化医学用途。Med-Gemini-2D为基于专家评估的AI驱动胸部X光(CXR)报告生成树立了新的标准,超越了两个不同的数据集的 previous best 结果,其绝对差值分别为1%和12%。在正常和异常病例中,AI报告与原始放射科医生的报告相比较,有57%和96%的AI报告被认为是“等同或更好”的。我们证明了使用Med-Gemini-3D生成3D计算机断层扫描(CT)体积的第一种大型多模态模型报告。在CT体积的评估中,尽管53%的AI报告在临床上是可以接受的,但需要进一步研究以满足专家放射科医生报告的质量要求。超越报告生成,Med-Gemini-2D在CXR视觉问答(VQA)方面超越了前面的最佳表现,并在CXR分类和放射学VQA方面表现出色,在20个任务中有17个任务超过了SoTA或基线。在病理学、眼科和皮肤病图像分类中,Med-Gemini-2D超越了基线,在18个任务中接近于任务特定的模型性能。除了成像之外,Med-Gemini-Polygenic在疾病风险预测方面超越了基于标准线性多基因风险评分的方法,并将其扩展到与培训无关的遗传相关疾病。尽管在安全关键医疗领域还需要进一步发展和评估,但我们的结果突出了Med-Gemini在广泛的医疗任务中的潜力。
https://arxiv.org/abs/2405.03162
Skin lesion segmentation is a critical task in computer-aided diagnosis systems for dermatological diseases. Accurate segmentation of skin lesions from medical images is essential for early detection, diagnosis, and treatment planning. In this paper, we propose a new model for skin lesion segmentation namely AC-MambaSeg, an enhanced model that has the hybrid CNN-Mamba backbone, and integrates advanced components such as Convolutional Block Attention Module (CBAM), Attention Gate, and Selective Kernel Bottleneck. AC-MambaSeg leverages the Vision Mamba framework for efficient feature extraction, while CBAM and Selective Kernel Bottleneck enhance its ability to focus on informative regions and suppress background noise. We evaluate the performance of AC-MambaSeg on diverse datasets of skin lesion images including ISIC-2018 and PH2; then compare it against existing segmentation methods. Our model shows promising potential for improving computer-aided diagnosis systems and facilitating early detection and treatment of dermatological diseases. Our source code will be made available at: this https URL.
皮肤病变分割是计算机辅助诊断系统皮肤疾病诊断中的一项关键任务。准确从医学图像中分割皮肤病变是早期诊断、诊断和治疗规划的必要条件。在本文中,我们提出了一个名为AC-MambaSeg的新模型用于皮肤病变分割,这是一种增强模型,具有混合CNN-Mamba骨干网络和高级组件,如卷积块注意模块(CBAM)、注意门和选择性内核瓶颈。AC-MambaSeg利用Vision Mamba框架进行高效的特征提取,而CBAM和选择性内核瓶颈则增强了其关注有信息区域并抑制背景噪声的能力。我们在包括ISIC-2018和PH2等多样数据集的皮肤病变图像上评估AC-MambaSeg的性能,然后与现有分割方法进行比较。我们的模型在改善计算机辅助诊断系统和促进早期诊断和治疗皮肤疾病方面具有令人鼓舞的潜力。我们的源代码将在此处公布:https://this URL。
https://arxiv.org/abs/2405.03011
Despite their improved capabilities in generation and reasoning, adapting large language models (LLMs) to the biomedical domain remains challenging due to their immense size and corporate privacy. In this work, we propose MedAdapter, a unified post-hoc adapter for test-time adaptation of LLMs towards biomedical applications. Instead of fine-tuning the entire LLM, MedAdapter effectively adapts the original model by fine-tuning only a small BERT-sized adapter to rank candidate solutions generated by LLMs. Experiments demonstrate that MedAdapter effectively adapts both white-box and black-box LLMs in biomedical reasoning, achieving average performance improvements of 25.48% and 11.31%, respectively, without requiring extensive computational resources or sharing data with third parties. MedAdapter also yields superior performance when combined with train-time adaptation, highlighting a flexible and complementary solution to existing adaptation methods. Faced with the challenges of balancing model performance, computational resources, and data privacy, MedAdapter provides an efficient, privacy-preserving, cost-effective, and transparent solution for adapting LLMs to the biomedical domain.
尽管生成和推理能力有所提高,将大型语言模型(LLMs)适应生物医学领域仍然具有挑战性,因为它们具有巨大的规模和企业隐私。在本文中,我们提出了MedAdapter,一种统一的后置适配器,用于在测试时对LLMs进行生物医学应用的适应。我们不是对整个LLM进行微调,而是通过微调只有BERT大小的适配器来有效地适应原始模型。实验证明,MedAdapter有效地将白盒和黑盒LLM在生物医学推理中进行适应,分别实现了平均性能提高25.48%和11.31%。与不需要大量计算资源或与第三方共享数据相比,MedAdapter还具有卓越的性能。当结合训练时适应时,MedAdapter更加凸显了其对现有适应方法的一个灵活且互补的解决方案。面对模型性能、计算资源和数据隐私的挑战,MedAdapter为将LLMs适应生物医学领域提供了高效、隐私保护、成本低廉和透明的解决方案。
https://arxiv.org/abs/2405.03000
In this paper, we introduce a simulacrum of hospital called Agent Hospital that simulates the entire process of treating illness. All patients, nurses, and doctors are autonomous agents powered by large language models (LLMs). Our central goal is to enable a doctor agent to learn how to treat illness within the simulacrum. To do so, we propose a method called MedAgent-Zero. As the simulacrum can simulate disease onset and progression based on knowledge bases and LLMs, doctor agents can keep accumulating experience from both successful and unsuccessful cases. Simulation experiments show that the treatment performance of doctor agents consistently improves on various tasks. More interestingly, the knowledge the doctor agents have acquired in Agent Hospital is applicable to real-world medicare benchmarks. After treating around ten thousand patients (real-world doctors may take over two years), the evolved doctor agent achieves a state-of-the-art accuracy of 93.06% on a subset of the MedQA dataset that covers major respiratory diseases. This work paves the way for advancing the applications of LLM-powered agent techniques in medical scenarios.
在本文中,我们提出了一个名为Agent Hospital的医院模拟模型,该模型模拟了治疗疾病的过程。所有患者、护士和医生都是由大型语言模型(LLMs)驱动的自主代理。我们的核心目标是让医生代理学会在模拟中如何治疗疾病。为此,我们提出了一个名为MedAgent-Zero的方法。 由于模拟可以根据知识库和LLMs模拟疾病的发生和进展,医生代理可以从成功和失败案例中积累经验。仿真实验表明,医生代理在各种任务上的治疗效果不断提高。更令人兴奋的是,医生代理在Agent Hospital中所获得的知识可以应用于现实世界的医疗基准。 在治疗大约10,000名患者(现实世界的医生可能需要两年多才能完成)后,进化的医生代理在覆盖主要呼吸疾病的部分MedQA数据集上达到93.06%的准确率,为LLM-驱动代理技术在医疗场景中的应用铺平道路。
https://arxiv.org/abs/2405.02957
Disease prediction holds considerable significance in modern healthcare, because of its crucial role in facilitating early intervention and implementing effective prevention measures. However, most recent disease prediction approaches heavily rely on laboratory test outcomes (e.g., blood tests and medical imaging from X-rays). Gaining access to such data for precise disease prediction is often a complex task from the standpoint of a patient and is always only available post-patient consultation. To make disease prediction available from patient-side, we propose Personalized Medical Disease Prediction (PoMP), which predicts diseases using patient health narratives including textual descriptions and demographic information. By applying PoMP, patients can gain a clearer comprehension of their conditions, empowering them to directly seek appropriate medical specialists and thereby reducing the time spent navigating healthcare communication to locate suitable doctors. We conducted extensive experiments using real-world data from Haodf to showcase the effectiveness of PoMP.
疾病预测在现代医疗保健中具有重要的意义,因为其在促进早期干预和实施有效预防措施方面发挥着关键作用。然而,最最新的疾病预测方法往往高度依赖实验室检查结果(例如血液测试和医学影像X光片)。从患者的角度来看,获取这样的数据进行精确的疾病预测通常是一个复杂的任务,并且只有在患者就诊后才能提供。为了从患者侧使疾病预测可用,我们提出了Personalized Medical Disease Prediction(PoMP)方法,该方法通过患者健康状况的文本描述和人口学信息预测疾病。通过应用PoMP,患者可以更清晰地了解他们的疾病情况,从而直接寻求适当的医疗专家,从而减少在寻找合适医生过程中花费的时间,使医疗沟通更加便捷。我们使用来自Haoodf的现实生活中数据进行了广泛的实验,以展示PoMP的有效性。
https://arxiv.org/abs/2405.02935
Medical imaging is limited by acquisition time and scanning equipment. CT and MR volumes, reconstructed with thicker slices, are anisotropic with high in-plane resolution and low through-plane resolution. We reveal an intriguing phenomenon that due to the mentioned nature of data, performing slice-wise interpolation from the axial view can yield greater benefits than performing super-resolution from other views. Based on this observation, we propose an Inter-Intra-slice Interpolation Network (I$^3$Net), which fully explores information from high in-plane resolution and compensates for low through-plane resolution. The through-plane branch supplements the limited information contained in low through-plane resolution from high in-plane resolution and enables continual and diverse feature learning. In-plane branch transforms features to the frequency domain and enforces an equal learning opportunity for all frequency bands in a global context learning paradigm. We further propose a cross-view block to take advantage of the information from all three views online. Extensive experiments on two public datasets demonstrate the effectiveness of I$^3$Net, and noticeably outperforms state-of-the-art super-resolution, video frame interpolation and slice interpolation methods by a large margin. We achieve 43.90dB in PSNR, with at least 1.14dB improvement under the upscale factor of $\times$2 on MSD dataset with faster inference. Code is available at this https URL.
医学影像受到采集时间和扫描设备的限制。使用更厚的切片进行CT和MR体积重建时,它们在平面分辨率和通过分辨率方面存在非均匀性。我们揭示了由于数据提及的性质,从轴向视图中进行切片级插值会比从其他视图执行超分辨率带来更大的好处。根据这个观察结果,我们提出了一个跨切片内插网络(I$^3$Net),它完全探索高平面分辨率和低通过分辨率的信息,并弥补低通过分辨率从高平面分辨率中的有限信息。通过平面分支将特征变换到频域,并确保全局上下文学习范式中所有频率带之间的平等学习机会。我们进一步提出了一个跨视图分支,以利用所有三个视图的信息。在两个公开数据集上的广泛实验证明,I$^3$Net的有效性非常显著,明显超过了最先进的超分辨率、视频帧插值和切片插值方法。在放大因子为$\times$2的MSD数据集上,其性能提高了43.90dB,并且在提高因子为$\times$2时,其性能显著超过了最先进的超分辨率方法。代码可在此处下载:https://www.github.com/your_username/I3Net_Experiment。
https://arxiv.org/abs/2405.02857
Brain tumor segmentation is a fundamental step in assessing a patient's cancer progression. However, manual segmentation demands significant expert time to identify tumors in 3D multimodal brain MRI scans accurately. This reliance on manual segmentation makes the process prone to intra- and inter-observer variability. This work proposes a brain tumor segmentation method as part of the BraTS-GoAT challenge. The task is to segment tumors in brain MRI scans automatically from various populations, such as adults, pediatrics, and underserved sub-Saharan Africa. We employ a recent CNN architecture for medical image segmentation, namely MedNeXt, as our baseline, and we implement extensive model ensembling and postprocessing for inference. Our experiments show that our method performs well on the unseen validation set with an average DSC of 85.54% and HD95 of 27.88. The code is available on this https URL.
肿瘤分割是评估患者癌症进展的重要步骤。然而,手动分割需要大量专业时间在3D多模态脑部MRI扫描中准确地识别肿瘤。这种对手动分割的依赖使得过程容易受到内和间观察者变异性。本文提出了一种作为 BraTS-GoAT 挑战的一部分的脑肿瘤分割方法。任务是从各种人群中自动分割脑MRI扫描中的肿瘤,包括成人、儿科和欠发达的撒哈拉以南非洲。我们采用最近的一个卷积神经网络架构——MedNeXt 作为基础,并对推理进行 extensive model ensemble 和 postprocessing。我们的实验结果表明,我们的方法在未见过的验证集上的平均DSC为85.54%和HD95为27.88。代码可以在这个 https:// URL 上找到。
https://arxiv.org/abs/2405.02852
Artificial neural networks trained on large, expert-labelled datasets are considered state-of-the-art for a range of medical image recognition tasks. However, categorically labelled datasets are time-consuming to generate and constrain classification to a pre-defined, fixed set of classes. For neuroradiological applications in particular, this represents a barrier to clinical adoption. To address these challenges, we present a self-supervised text-vision framework that learns to detect clinically relevant abnormalities in brain MRI scans by directly leveraging the rich information contained in accompanying free-text neuroradiology reports. Our training approach consisted of two-steps. First, a dedicated neuroradiological language model - NeuroBERT - was trained to generate fixed-dimensional vector representations of neuroradiology reports (N = 50,523) via domain-specific self-supervised learning tasks. Next, convolutional neural networks (one per MRI sequence) learnt to map individual brain scans to their corresponding text vector representations by optimising a mean square error loss. Once trained, our text-vision framework can be used to detect abnormalities in unreported brain MRI examinations by scoring scans against suitable query sentences (e.g., 'there is an acute stroke', 'there is hydrocephalus' etc.), enabling a range of classification-based applications including automated triage. Potentially, our framework could also serve as a clinical decision support tool, not only by suggesting findings to radiologists and detecting errors in provisional reports, but also by retrieving and displaying examples of pathologies from historical examinations that could be relevant to the current case based on textual descriptors.
通过在大型、专家标注的数据集上训练的人工神经网络被认为是各种医学图像识别任务的当前最先进的。然而,分类标注的数据集需要花费较长的时间来生成,并限制将分类限制为预定义、固定的类。特别是,在神经放射学应用中,这代表了临床采用的障碍。为了应对这些挑战,我们提出了一个自监督的文本视觉框架,通过直接利用伴随的免费文本神经放射学报告中的丰富信息来检测临床相关的异常脑MRI扫描。我们的训练方法包括两个步骤。首先,一个专门的语言模型——NeuroBERT 通过领域特定的自监督学习任务训练,生成固定维度的神经放射学报告的固定维向量表示(N = 50,523)。接下来,卷积神经网络(每个MRI序列一个)通过优化均方误差损失来学习将单个脑扫描映射到相应的文本向量表示。经过训练后,我们的文本视觉框架可用于通过评分扫描与适当的查询句子(例如,“有急性中风”,“有高血压”等)相匹配来检测未报告的脑MRI examination中的异常,实现各种分类基础应用(包括自动分类分诊)。可能的是,我们的框架还可以作为临床决策支持工具,不仅通过向放射科医生建议发现,还通过根据文本描述检索和显示历史检查中的疾病实例来发挥作用。
https://arxiv.org/abs/2405.02782
A major roadblock in the seamless digitization of medical records remains the lack of interoperability of existing records. Extracting relevant medical information required for further treatment planning or even research is a time consuming labour intensive task involving the much valuable time of doctors. In this demo paper we present, MedPromptExtract an automated tool using a combination of semi supervised learning, large language models, natural lanuguage processing and prompt engineering to convert unstructured medical records to structured data which is amenable to further analysis.
在医疗记录的无缝数字化过程中,一个主要障碍是现有记录之间的不兼容性。从进一步治疗计划或研究提取相关医疗信息是一个耗时且劳动密集型任务,涉及医生们宝贵的时间。在本文演示论文中,我们提出了MedPromptExtract,一种使用半监督学习、大型语言模型、自然语言处理和提示工程相结合的自动工具,将无结构医疗记录转换为可以进一步分析的结构化数据。
https://arxiv.org/abs/2405.02664
Conformal Prediction (CP) quantifies network uncertainty by building a small prediction set with a pre-defined probability that the correct class is within this set. In this study we tackle the problem of CP calibration based on a validation set with noisy labels. We introduce a conformal score that is robust to label noise. The noise-free conformal score is estimated using the noisy labeled data and the noise level. In the test phase the noise-free score is used to form the prediction set. We applied the proposed algorithm to several standard medical imaging classification datasets. We show that our method outperforms current methods by a large margin, in terms of the average size of the prediction set, while maintaining the required coverage.
conformal预测(CP)通过构建一个预定义概率的预测集来量化网络的不确定性。在本文中,我们解决了基于噪声标注的验证集的问题。我们引入了一个对标签噪声具有鲁棒性的 conformal分数。分数的噪声水平通过使用噪声标注数据和噪声水平进行估计。在测试阶段,无噪声分数被用作预测集。我们将所提出的算法应用于多个标准的医学成像分类数据集。我们证明了我们的方法在平均预测集大小方面优于现有方法,同时保持所需的覆盖范围。
https://arxiv.org/abs/2405.02648
Nowadays, large-scale foundation models are being increasingly integrated into numerous safety-critical applications, including human-autonomy teaming (HAT) within transportation, medical, and defence domains. Consequently, the inherent 'black-box' nature of these sophisticated deep neural networks heightens the significance of fostering mutual understanding and trust between humans and autonomous systems. To tackle the transparency challenges in HAT, this paper conducts a thoughtful study on the underexplored domain of Explainable Interface (EI) in HAT systems from a human-centric perspective, thereby enriching the existing body of research in Explainable Artificial Intelligence (XAI). We explore the design, development, and evaluation of EI within XAI-enhanced HAT systems. To do so, we first clarify the distinctions between these concepts: EI, explanations and model explainability, aiming to provide researchers and practitioners with a structured understanding. Second, we contribute to a novel framework for EI, addressing the unique challenges in HAT. Last, our summarized evaluation framework for ongoing EI offers a holistic perspective, encompassing model performance, human-centered factors, and group task objectives. Based on extensive surveys across XAI, HAT, psychology, and Human-Computer Interaction (HCI), this review offers multiple novel insights into incorporating XAI into HAT systems and outlines future directions.
如今,大型基础模型正日益集成到许多关键应用中,包括运输、医疗和军事领域。因此,这些复杂深度神经网络固有的“黑盒子”性质加剧了在人类和自主系统之间促进相互理解和信任的重要性。为了应对HAT中透明度挑战,本文从人类中心的角度对HAT系统中的可解释接口(EI)进行了一项深入的研究,从而为现有的可解释人工智能(XAI)研究提供了丰富的内容。我们研究了在XAI增强的HAT系统中的EI的设计、开发和评估。为此,我们首先明确了这些概念之间的区别:EI、解释和模型可解释性,旨在为研究人员和实践者提供结构化的理解。其次,我们为EI提供了一个新的框架,解决了HAT中独特的挑战。最后,我们针对正在进行中的EI的总结评估框架提供了一个整体视角,包括模型性能、人类因素和团队任务目标。基于对XAI、HAT、心理学和人机交互(HCI)领域的广泛调查,本文综述为将XAI融入HAT系统提供了多个新的见解,并为未来的研究提供了方向。
https://arxiv.org/abs/2405.02583
Surgical action localization is a challenging computer vision problem. While it has promising applications including automated training of surgery procedures, surgical workflow optimization, etc., appropriate model design is pivotal to accomplishing this task. Moreover, the lack of suitable medical datasets adds an additional layer of complexity. To that effect, we introduce a new complex dataset of nephrectomy surgeries called UroSlice. To perform the action localization from these videos, we propose a novel model termed as `ViTALS' (Vision Transformer for Action Localization in Surgical Nephrectomy). Our model incorporates hierarchical dilated temporal convolution layers and inter-layer residual connections to capture the temporal correlations at finer as well as coarser granularities. The proposed approach achieves state-of-the-art performance on Cholec80 and UroSlice datasets (89.8% and 66.1% accuracy, respectively), validating its effectiveness.
手术动作定位是一个具有挑战性的计算机视觉问题。虽然它具有包括自动手术程序训练、手术工作流程优化等的有前途的应用,但适当的模型设计是完成此任务的关键。此外,缺乏合适的医疗数据集还增加了一层复杂性。因此,我们引入了一个名为“UroSlice”的新颖的肾切除手术数据集。为了从这些视频中进行动作定位,我们提出了一个名为“ViTALS”的新模型(视觉Transformer用于手术动作定位)。我们的模型包括层次化的扩张卷积层和跨层残差连接,以捕捉细粒度和粗粒度的时间关联。所提出的技术在Cholec80和UroSlice数据集上的表现达到最先进水平(89.8%和66.1%的准确率,分别),验证了其有效性和可靠性。
https://arxiv.org/abs/2405.02571
As generative artificial intelligence (AI), particularly Large Language Models (LLMs), continues to permeate healthcare, it remains crucial to supplement traditional automated evaluations with human expert evaluation. Understanding and evaluating the generated texts is vital for ensuring safety, reliability, and effectiveness. However, the cumbersome, time-consuming, and non-standardized nature of human evaluation presents significant obstacles to the widespread adoption of LLMs in practice. This study reviews existing literature on human evaluation methodologies for LLMs within healthcare. We highlight a notable need for a standardized and consistent human evaluation approach. Our extensive literature search, adhering to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines, spans publications from January 2018 to February 2024. This review provides a comprehensive overview of the human evaluation approaches used in diverse healthcare applications.This analysis examines the human evaluation of LLMs across various medical specialties, addressing factors such as evaluation dimensions, sample types, and sizes, the selection and recruitment of evaluators, frameworks and metrics, the evaluation process, and statistical analysis of the results. Drawing from diverse evaluation strategies highlighted in these studies, we propose a comprehensive and practical framework for human evaluation of generative LLMs, named QUEST: Quality of Information, Understanding and Reasoning, Expression Style and Persona, Safety and Harm, and Trust and Confidence. This framework aims to improve the reliability, generalizability, and applicability of human evaluation of generative LLMs in different healthcare applications by defining clear evaluation dimensions and offering detailed guidelines.
作为生成人工智能(AI),特别是大型语言模型(LLMs),继续在医疗保健领域渗透,确保通过人类专家评估补充传统自动化评估至关重要。理解并评估生成的文本对确保安全性、可靠性和有效性至关重要。然而,人类评估的繁琐、耗时和非标准化性质在实际应用中造成了重大障碍,这使得LLMs在医疗保健领域的广泛采用面临挑战。 本研究回顾了LLM在医疗保健领域现有的人评估方法论。我们强调了标准化和一致的人评估方法的重要性。我们广泛搜索了从2018年1月至2024年2月期间发表的出版物,遵循PRISMA指南,对评估方法进行了深入研究。本审查提供了各种医疗保健应用中使用的人评估方法的全面概述。 本分析研究了LLM在不同医学专业领域的人评估,包括评估维度、样本类型和大小、评估者的选择和招募、评估框架和指标以及评估过程和数据分析结果。从这些研究中强调的不同评估策略中,我们提出了一个全面且实用的框架,名为QUEST:信息质量、理解、推理、表达风格和人物、安全和伤害、信任和信心。该框架旨在通过明确评估维度并为不同的医疗保健应用提供详细指导,提高人类评估LLM的可靠性、可重复性和适用性。
https://arxiv.org/abs/2405.02559
Computed tomography (CT) is a widely used non-invasive medical imaging technique for disease diagnosis. The diagnostic accuracy is often affected by image resolution, which can be insufficient in practice. For medical CT images, the through-plane resolution is often worse than the in-plane resolution and there can be overlap between slices, causing difficulties in diagnoses. Self-supervised methods for through-plane resolution enhancement, which train on in-plane images and infer on through-plane images, have shown promise for both CT and MRI imaging. However, existing self-supervised methods either neglect overlap or can only handle specific cases with fixed combinations of resolution and overlap. To address these limitations, we propose a self-supervised method called SR4ZCT. It employs the same off-axis training approach while being capable of handling arbitrary combinations of resolution and overlap. Our method explicitly models the relationship between resolutions and voxel spacings of different planes to accurately simulate training images that match the original through-plane images. We highlight the significance of accurate modeling in self-supervised off-axis training and demonstrate the effectiveness of SR4ZCT using a real-world dataset.
计算断层扫描(CT)是一种广泛应用于疾病诊断的非侵入性医疗影像技术。诊断准确性通常受到图像分辨率的影响,在实际应用中可能不足以准确。对于医学CT图像,通过平面的分辨率通常比在平面分辨率更差,而且切片之间可能存在重叠,导致诊断困难。自监督方法通过在平面图像上训练并通过平面图像进行推断,对CT和MRI成像都显示出希望。然而,现有的自监督方法要么忽视重叠,要么只能处理具有固定组合分辨率和平面重叠的特定情况。为了克服这些限制,我们提出了一个名为SR4ZCT的自监督方法。它采用相同的离轴训练方法,同时能够处理任意组合的分辨率和重叠。我们的方法明确地建模了不同平面的分辨率和体素空间之间的关系,准确地模拟了与原始通过平面图像匹配的训练图像。我们强调了在自监督离轴训练中准确建模的重要性,并通过一个真实世界的数据集证明了SR4ZCT的有效性。
https://arxiv.org/abs/2405.02515
Currently, the foundation models represented by large language models have made dramatic progress and are used in a very wide range of domains including 2D and 3D vision. As one of the important application domains of foundation models, earth observation has attracted attention and various approaches have been developed. When considering earth observation as a single image capture, earth observation imagery can be processed as an image with three or more channels, and when it comes with multiple image captures of different timestamps at one location, the temporal observation can be considered as a set of continuous image resembling video frames or medical SCAN slices. This paper presents Spatio-Temporal SwinMAE (ST-SwinMAE), an architecture which particularly focuses on representation learning for spatio-temporal image processing. Specifically, it uses a hierarchical Masked Auto-encoder (MAE) with Video Swin Transformer blocks. With the architecture, we present a pretrained model named Degas 100M as a geospatial foundation model. Also, we propose an approach for transfer learning with Degas 100M, which both pretrained encoder and decoder of MAE are utilized with skip connections added between them to achieve multi-scale information communication, forms an architecture named Spatio-Temporal SwinUNet (ST-SwinUNet). Our approach shows significant improvements of performance over existing state-of-the-art of foundation models. Specifically, for transfer learning of the land cover downstream task on the PhilEO Bench dataset, it shows 10.4\% higher accuracy compared with other geospatial foundation models on average.
目前,大型语言模型所代表的基模型已经取得了显著的进步,并在包括2D和3D视觉在内的各种领域得到了广泛应用。作为基础模型的重要应用领域之一,地球观测吸引了人们的注意,并开发了各种方法。当将地球观测视为单张图像捕捉时,地球观测图像可以处理为具有三个或更多通道的图像,而当它位于一个位置的多个不同时间戳的图像捕捉时,时间观察可以被视为一系列连续的图像,类似于视频帧或医学SCAN切片。本文介绍了一种名为Spatio-Temporal SwinMAE(ST-SwinMAE)的架构,该架构特别关注空间-时间图像处理中的表示学习。具体来说,它使用了一个层次化的遮罩自编码器(MAE)和视频Swin Transformer块。通过这种架构,我们提出了一个名为Spatio-Temporal SwinUNet(ST-SwinUNet)的预训练模型。我们还提出了一种使用Degas 100M作为空间-时间基础模型的迁移学习方法,该模型包括MAE的预训练编码器和解码器,并在它们之间添加了跳跃连接以实现多尺度信息交流,形成了一个名为Spatio-Temporal SwinUNet的架构。我们的方法在现有基础模型性能上显示出显著的改进。具体来说,在菲欧埃奥基准数据集上对地表覆盖下游任务的迁移学习中,它比其他空间-时间基础模型平均高10.4%。
https://arxiv.org/abs/2405.02512
Computed Tomography (CT) is pivotal in industrial quality control and medical diagnostics. Sparse-view CT, offering reduced ionizing radiation, faces challenges due to its under-sampled nature, leading to ill-posed reconstruction problems. Recent advancements in Implicit Neural Representations (INRs) have shown promise in addressing sparse-view CT reconstruction. Recognizing that CT often involves scanning similar subjects, we propose a novel approach to improve reconstruction quality through joint reconstruction of multiple objects using INRs. This approach can potentially leverage both the strengths of INRs and the statistical regularities across multiple objects. While current INR joint reconstruction techniques primarily focus on accelerating convergence via meta-initialization, they are not specifically tailored to enhance reconstruction quality. To address this gap, we introduce a novel INR-based Bayesian framework integrating latent variables to capture the inter-object relationships. These variables serve as a dynamic reference throughout the optimization, thereby enhancing individual reconstruction fidelity. Our extensive experiments, which assess various key factors such as reconstruction quality, resistance to overfitting, and generalizability, demonstrate significant improvements over baselines in common numerical metrics. This underscores a notable advancement in CT reconstruction methods.
计算断层成像(CT)在工业品质控制和医学诊断中具有关键作用。稀疏视野CT由于其欠采样特性,面临挑战,导致欠拟合重建问题。随着隐式神经表示(INRs)的最近进展,显示了在解决稀疏视野CT重建方面取得进展的前景。认识到CT通常涉及对类似被试的扫描,我们提出了一种通过使用INRs共同重构多个对象来提高重建质量的新方法。这种方法可以利用INRs的优点和多个对象之间的统计 regularities。尽管当前的INR联合重建技术主要通过元初始化加速收敛,但它们并未专门针对提高重建质量进行优化。为了填补这一空白,我们引入了一个基于INRs的新颖贝叶斯框架,将潜在变量集成在一起,以捕捉对象之间的交互关系。这些变量在优化过程中充当动态参考,从而提高每个重建对象的准确性。我们对各种关键因素(如重建质量、过拟合抵抗性和泛化能力)的广泛实验证明,在常见数值指标上,基准线以上显著改善。这表明在CT重建方法上取得了显著的进展。
https://arxiv.org/abs/2405.02509
Online optimisation facilitates the solution of dynamic inverse problems, such as image stabilisation, fluid flow monitoring, and dynamic medical imaging. In this paper, we improve upon previous work on predictive online primal-dual methods on two fronts. Firstly, we provide a more concise analysis that symmetrises previously unsymmetric regret bounds, and relaxes previous restrictive conditions on the dual predictor. Secondly, based on the latter, we develop several improved dual predictors. We numerically demonstrate their efficacy in image stabilisation and dynamic positron emission tomography.
在本文中,我们在两个方面超过了以往关于预测在线初始-对偶方法的工作。首先,我们提供了更简洁的分析,对称化了之前不对称的后悔的上界,并放宽了之前对对偶预测器的限制条件。其次,基于后者的研究,我们开发了几个改进的对偶预测器。我们用数值方式证明了它们在图像稳定和动态正电子发射断层扫描中的有效性。
https://arxiv.org/abs/2405.02497