Diffusion-based models for text-to-image generation have gained immense popularity due to recent advancements in efficiency, accessibility, and quality. Although it is becoming increasingly feasible to perform inference with these systems using consumer-grade GPUs, training them from scratch still requires access to large datasets and significant computational resources. In the case of medical image generation, the availability of large, publicly accessible datasets that include text reports is limited due to legal and ethical concerns. While training a diffusion model on a private dataset may address this issue, it is not always feasible for institutions lacking the necessary computational resources. This work demonstrates that pre-trained Stable Diffusion models, originally trained on natural images, can be adapted to various medical imaging modalities by training text embeddings with textual inversion. In this study, we conducted experiments using medical datasets comprising only 100 samples from three medical modalities. Embeddings were trained in a matter of hours, while still retaining diagnostic relevance in image generation. Experiments were designed to achieve several objectives. Firstly, we fine-tuned the training and inference processes of textual inversion, revealing that larger embeddings and more examples are required. Secondly, we validated our approach by demonstrating a 2\% increase in the diagnostic accuracy (AUC) for detecting prostate cancer on MRI, which is a challenging multi-modal imaging modality, from 0.78 to 0.80. Thirdly, we performed simulations by interpolating between healthy and diseased states, combining multiple pathologies, and inpainting to show embedding flexibility and control of disease appearance. Finally, the embeddings trained in this study are small (less than 1 MB), which facilitates easy sharing of medical data with reduced privacy concerns.
散射模型用于文本到图像生成已经因其效率和可用性的进步而获得了极大的流行度。尽管使用消费者级别的GPU进行推断已经成为越来越可行的方法,但对于训练从 scratch 开始的全新模型仍然需要访问大量的数据集和重要的计算资源。在医学图像生成方面,包含文本报告的大规模公共数据集的可用性因为法律和伦理问题而受到限制。虽然训练一个私有数据集可能可以解决这一问题,但对于缺乏必要的计算资源的机构来说并不是 always 可行的。这项工作证明了训练先前训练于自然图像上的稳定扩散模型,可以将其适应各种医学成像模式,通过训练文本嵌入来实现。在本研究中,我们使用仅包含 100 个样本的医疗数据集训练了文本嵌入,仅仅需要几个小时,但仍然能够在图像生成中保留诊断相关性。实验旨在实现多个目标。首先,我们优化了文本逆置的训练和推断过程,揭示了需要更多的嵌入和更多的示例才能实现。其次,我们证明了我们的方法的有效性,通过演示在 MRI 中检测前列腺癌时,诊断准确性(AUC)的提高,从 0.78 提高到了 0.80。第三,我们使用平滑过渡在不同健康和患病状态之间进行建模,结合多种病理学,并进行了涂色,以展示嵌入的灵活性和控制疾病的外观。最后,训练在 this 研究中使用的嵌入非常小(小于 1 MB),这便于更轻松地分享医学数据,同时减少隐私担忧。
https://arxiv.org/abs/2303.13430
Multiple Instance learning (MIL) models have been extensively used in pathology to predict biomarkers and risk-stratify patients from gigapixel-sized images. Machine learning problems in medical imaging often deal with rare diseases, making it important for these models to work in a label-imbalanced setting. Furthermore, these imbalances can occur in out-of-distribution (OOD) datasets when the models are deployed in the real-world. We leverage the idea that decoupling feature and classifier learning can lead to improved decision boundaries for label imbalanced datasets. To this end, we investigate the integration of supervised contrastive learning with multiple instance learning (SC-MIL). Specifically, we propose a joint-training MIL framework in the presence of label imbalance that progressively transitions from learning bag-level representations to optimal classifier learning. We perform experiments with different imbalance settings for two well-studied problems in cancer pathology: subtyping of non-small cell lung cancer and subtyping of renal cell carcinoma. SC-MIL provides large and consistent improvements over other techniques on both in-distribution (ID) and OOD held-out sets across multiple imbalanced settings.
多实例学习(MIL)模型在病理学中被广泛使用,从Gigapixel大小的图像中预测生物标记和风险分类患者。医学影像学中的机器学习问题通常涉及罕见的疾病,因此这些模型必须在标签不平衡的环境下工作。此外,当模型在现实世界部署时,标签不平衡可能会发生在非均匀分布的数据集上。我们利用的是 feature 和分类器学习解耦的概念,这可以导致标签不平衡的数据集决策边界改善。为此,我们研究了监督对比学习与多实例学习(SC-MIL)的集成。具体而言,我们提出了在标签不平衡的情况下进行联合训练的 MIL 框架,并逐步从学习袋级表示转移到最佳分类器学习。我们对两个在癌症病理学中被广泛研究的问题的不同类型进行了实验:非小细胞肺癌和肺癌的不同类型。SC-MIL 在均匀分布(ID)和 OOD 保留组中提供了比其他技术大且一致的改善。
https://arxiv.org/abs/2303.13405
Interactive segmentation enables users to segment as needed by providing cues of objects, which introduces human-computer interaction for many fields, such as image editing and medical image analysis. Typically, massive and expansive pixel-level annotations are spent to train deep models by object-oriented interactions with manually labeled object masks. In this work, we reveal that informative interactions can be made by simulation with semantic-consistent yet diverse region exploration in an unsupervised paradigm. Concretely, we introduce a Multi-granularity Interaction Simulation (MIS) approach to open up a promising direction for unsupervised interactive segmentation. Drawing on the high-quality dense features produced by recent self-supervised models, we propose to gradually merge patches or regions with similar features to form more extensive regions and thus, every merged region serves as a semantic-meaningful multi-granularity proposal. By randomly sampling these proposals and simulating possible interactions based on them, we provide meaningful interaction at multiple granularities to teach the model to understand interactions. Our MIS significantly outperforms non-deep learning unsupervised methods and is even comparable with some previous deep-supervised methods without any annotation.
交互式分割通过提供对象的线索,使用户根据需要进行分割,在许多领域,如图像编辑和医学影像分析,引入了人类-计算机互动。通常情况下,大量的、昂贵的像素级注释用于训练深度模型,通过对象导向的互动与手动标注的对象 masks 进行。在这项工作中,我们揭示了通过无监督范式中模拟语义 consistent 且多样性丰富的区域,可以实现 informative 的互动。具体来说,我们介绍了一种多粒度互动模拟(MIS)方法,以打开无监督交互分割的一个有前途的方向。基于最近自监督模型产生的高质量密集特征,我们提议逐渐合并具有相似特征的补丁或区域,以形成更广泛的区域,因此,每个合并区域都是语义有意义的多粒度建议。通过随机采样这些建议并基于它们模拟可能的互动,我们提供了有意义的多粒度交互,以教模型理解互动。我们的 MIS 显著超越了非深度学习无监督方法,甚至与一些没有标注的先前深度监督方法相当。
https://arxiv.org/abs/2303.13399
Automated diagnosis prediction from medical images is a valuable resource to support clinical decision-making. However, such systems usually need to be trained on large amounts of annotated data, which often is scarce in the medical domain. Zero-shot methods address this challenge by allowing a flexible adaption to new settings with different clinical findings without relying on labeled data. Further, to integrate automated diagnosis in the clinical workflow, methods should be transparent and explainable, increasing medical professionals' trust and facilitating correctness verification. In this work, we introduce Xplainer, a novel framework for explainable zero-shot diagnosis in the clinical setting. Xplainer adapts the classification-by-description approach of contrastive vision-language models to the multi-label medical diagnosis task. Specifically, instead of directly predicting a diagnosis, we prompt the model to classify the existence of descriptive observations, which a radiologist would look for on an X-Ray scan, and use the descriptor probabilities to estimate the likelihood of a diagnosis. Our model is explainable by design, as the final diagnosis prediction is directly based on the prediction of the underlying descriptors. We evaluate Xplainer on two chest X-ray datasets, CheXpert and ChestX-ray14, and demonstrate its effectiveness in improving the performance and explainability of zero-shot diagnosis. Our results suggest that Xplainer provides a more detailed understanding of the decision-making process and can be a valuable tool for clinical diagnosis.
医学图像的自动诊断预测是一种重要的资源,以支持临床决策。然而,这种系统通常需要从大量的标注数据中进行训练,这在医学领域中往往是缺乏的。零样本方法解决了这个问题,它可以在没有标记数据的情况下灵活适应不同的临床发现设置,无需依赖标签数据。进一步,将自动诊断集成到临床工作流程中,方法应该透明和可解释,增加医务人员的信任,并方便正确性验证。在这个项目中,我们介绍了Xplainer,一个可在临床环境中解释零样本诊断的新框架。Xplainer将竞争视觉语言模型的描述分类方法应用于多标签医学诊断任务。具体来说,我们不再直接预测诊断,而是促使模型分类描述观察的存在,这是放射科医生在X射线扫描中会寻找的描述观察,并使用描述概率估计诊断的可能性。我们的模型是设计可解释的,因为其最终诊断预测直接基于底层描述预测。我们评估了 CheXpert和 chestX-ray14两个心电学数据集,并证明了Xplainer在改善零样本诊断性能和解释性方面的效力。我们的结果表明,Xplainer提供了更详细的理解决策过程,可以成为临床诊断的宝贵工具。
https://arxiv.org/abs/2303.13391
Label scarcity is a bottleneck for improving task performance in specialised domains. We propose a novel compositional transfer learning framework (DoT5 - domain compositional zero-shot T5) for zero-shot domain transfer. Without access to in-domain labels, DoT5 jointly learns domain knowledge (from MLM of unlabelled in-domain free text) and task knowledge (from task training on more readily available general-domain data) in a multi-task manner. To improve the transferability of task training, we design a strategy named NLGU: we simultaneously train NLG for in-domain label-to-data generation which enables data augmentation for self-finetuning and NLU for label prediction. We evaluate DoT5 on the biomedical domain and the resource-lean subdomain of radiology, focusing on NLI, text summarisation and embedding learning. DoT5 demonstrates the effectiveness of compositional transfer learning through multi-task learning. In particular, DoT5 outperforms the current SOTA in zero-shot transfer by over 7 absolute points in accuracy on RadNLI. We validate DoT5 with ablations and a case study demonstrating its ability to solve challenging NLI examples requiring in-domain expertise.
标签稀缺是改善特定领域的任务表现的瓶颈。我们提出了一种全新的组件化 Transfer Learning 框架(DoT5 - 域组件式零次输入 T5),用于零次输入域转移。在没有访问域内标签的情况下,DoT5 以多任务方式共同学习域知识和任务知识(从未标记的域内自由文本的 LM 中提取任务知识,并从任务训练更常见的通用数据集中提取数据增强和 NLU)。为了提高任务训练的可转移性,我们设计了一种名为 NLGU 的策略:我们同时训练 In-domain 标签到数据生成 NLG,这可以实现数据增强的自训练和标签预测的 NLU。我们在生物医学领域和放射学资源受限的子领域评估了 DoT5,重点关注 NLI、文本摘要和嵌入学习。DoT5 通过多任务学习证明了组件化转移学习的 effectiveness。特别是,DoT5 在 RadNLI 上的零次输入转移中比当前的最佳方法高出超过 7 的绝对点的准确性。我们通过实验和案例研究验证了 DoT5 的能力,以解决需要域内专业知识的具有挑战性的 NLI 示例。
https://arxiv.org/abs/2303.13386
This short technical report demonstrates a simple technique that yields state of the art results in medical image-text matching tasks. We analyze the use of OpenAI's CLIP, a general image-text matching model, and observe that CLIP's limited textual input size has negative impact on downstream performance in the medical domain where encoding longer textual contexts is often required. We thus train and release ClipMD, which is trained with a simple sliding window technique to encode textual captions. ClipMD was tested on two medical image-text datasets and compared with other image-text matching models. The results show that ClipMD outperforms other models on both datasets by a large margin. We make our code and pretrained model publicly available.
这段简短的技术报告展示了一种简单的技术,可以在医学图像-文本匹配任务中获得最先进的结果。我们分析了OpenAI的Clip,一个通用的图像-文本匹配模型,并观察了Clip有限的文字输入大小的消极影响,因为在医学领域中,通常需要编码更长的文字上下文。因此,我们训练并发布了ClipMD,它是通过一个简单的滑动窗口技术编码文本标题的训练方法。ClipMD对两个医学图像-文本数据集进行了测试,并与其他图像-文本匹配模型进行了比较。结果表明,ClipMD在两个数据集上比其他模型表现更好。我们公开发布了我们的代码和预训练模型。
https://arxiv.org/abs/2303.13340
Modern surgeries are performed in complex and dynamic settings, including ever-changing interactions between medical staff, patients, and equipment. The holistic modeling of the operating room (OR) is, therefore, a challenging but essential task, with the potential to optimize the performance of surgical teams and aid in developing new surgical technologies to improve patient outcomes. The holistic representation of surgical scenes as semantic scene graphs (SGG), where entities are represented as nodes and relations between them as edges, is a promising direction for fine-grained semantic OR understanding. We propose, for the first time, the use of temporal information for more accurate and consistent holistic OR modeling. Specifically, we introduce memory scene graphs, where the scene graphs of previous time steps act as the temporal representation guiding the current prediction. We design an end-to-end architecture that intelligently fuses the temporal information of our lightweight memory scene graphs with the visual information from point clouds and images. We evaluate our method on the 4D-OR dataset and demonstrate that integrating temporality leads to more accurate and consistent results achieving an +5% increase and a new SOTA of 0.88 in macro F1. This work opens the path for representing the entire surgery history with memory scene graphs and improves the holistic understanding in the OR. Introducing scene graphs as memory representations can offer a valuable tool for many temporal understanding tasks.
现代手术在复杂且动态的场景中进行,包括医疗人员、患者和设备之间的不断变化的互动。因此,对手术空间的整个建模是一个具有挑战性但必要的任务,有潜力优化手术团队的表现,并帮助开发新的手术技术,提高患者的治疗效果。将手术场景作为一个语义场景图(SGG)的整个建模,其中实体表示为节点,它们之间的关系表示为边,是一个高精度的语义OR理解有前途的方向。我们首次提出了使用时间信息来进行更准确且一致的整个OR建模。具体来说,我们引入了记忆场景图,其中之前的时间步骤的场景图作为时间表示指导当前预测。我们设计了一个端到端架构,智能地融合我们的轻量级记忆场景图的时间信息与点云和图像的视觉信息。我们在4DOR数据集上评估了我们的方法,并证明了将时间整合在一起会导致更准确且一致的结果,实现+5%的增加,并在宏观F1中获得了一个新的SOTA。这项工作开辟了用记忆场景图代表整个手术历史并改善OR整体理解的道路。引入场景图作为记忆表示可以提供许多时间理解任务中的一种宝贵的工具。
https://arxiv.org/abs/2303.13293
With the increasing ubiquity of cameras and smart sensors, humanity is generating data at an exponential rate. Access to this trove of information, often covering yet-underrepresented use-cases (e.g., AI in medical settings) could fuel a new generation of deep-learning tools. However, eager data scientists should first provide satisfying guarantees w.r.t. the privacy of individuals present in these untapped datasets. This is especially important for images or videos depicting faces, as their biometric information is the target of most identification methods. While a variety of solutions have been proposed to de-identify such images, they often corrupt other non-identifying facial attributes that would be relevant for downstream tasks. In this paper, we propose Disguise, a novel algorithm to seamlessly de-identify facial images while ensuring the usability of the altered data. Unlike prior arts, we ground our solution in both differential privacy and ensemble-learning research domains. Our method extracts and swaps depicted identities with fake ones, synthesized via variational mechanisms to maximize obfuscation and non-invertibility; while leveraging the supervision from a mixture-of-experts to disentangle and preserve other utility attributes. We extensively evaluate our method on multiple datasets, demonstrating higher de-identification rate and superior consistency than prior art w.r.t. various downstream tasks.
随着相机和智能传感器的普及,人类正在以指数级速度生成数据。访问这些数据宝藏,通常涵盖尚未被充分覆盖的使用场景(例如,医疗场景中的人工智能),可以推动新一代深度学习工具的开发。然而,渴望数据科学家应该首先为这些未挖掘的数据中的个人隐私提供令人满意的保障。对于描绘面部的图像或视频,这尤为重要,因为它们的生物识别信息是大多数身份验证方法的目标。虽然已经提出了多种解决方案来解谜这些图像,但它们往往损坏了与后续任务相关的其他非识别面部属性。在本文中,我们提出了伪装算法,一种 seamlessly 解谜面部图像同时确保其可用性的新方法。与以前的艺术形式不同,我们将其解决方案建立在差异隐私和集成学习研究 domains 的双重框架内。我们的算法从描绘的身份信息中提取和交换,通过变分机制合成,以最大化混淆和非逆转性;同时利用专家混合组的监督来分离和保留其他有用属性。我们对这些数据集进行了广泛的评估,证明了相比以前的艺术形式,我们的解谜方法解谜率和一致性更高,其性能更加优越。
https://arxiv.org/abs/2303.13269
Universal anomaly detection still remains a challenging prob- lem in machine learning and medical image analysis. It is possible to learn an expected distribution from a single class of normative samples, e.g., through epistemic uncertainty estimates, auto-encoding models, or from synthetic anomalies in a self-supervised way. The performance of self-supervised anomaly detection approaches is still inferior compared to methods that use examples from known unknown classes to shape the decision boundary. However, outlier exposure methods often do not identify unknown unknowns. Here we discuss an improved self-supervised single-class training strategy that supports the approximation of proba- bilistic inference with loosen feature locality constraints. We show that up-scaling of gradients with histogram-equalised images is beneficial for recently proposed self-supervision tasks. Our method is integrated into several out-of-distribution (OOD) detection models and we show evi- dence that our method outperforms the state-of-the-art on various bench- mark datasets. Source code will be publicly available by the time of the conference.
普遍异常检测仍然是机器学习和医学图像分析中一个挑战性的问题。从一类校准样本中学习期望分布,例如通过知识不确定性估计、自动编码模型或通过自监督的方式来合成异常样本。自监督异常检测方法的性能仍然比使用已知未知类样本来 shaping决策边界的方法差。然而,异常暴露方法通常无法识别未知未知例。在此我们讨论了改进的自监督单个类训练策略,支持放宽特征局部限制的概率推断推断。我们表明,对梯度图像进行直方图均衡化可以提高最近提出的自监督任务的性能。我们的方法被集成到多个非分布检测模型中,并证据表明,我们的方法在多个基准数据集上比当前最先进的方法表现更好。源代码将在会议结束后公开可用。
https://arxiv.org/abs/2303.13227
Out of distribution (OOD) medical images are frequently encountered, e.g. because of site- or scanner differences, or image corruption. OOD images come with a risk of incorrect image segmentation, potentially negatively affecting downstream diagnoses or treatment. To ensure robustness to such incorrect segmentations, we propose Laplacian Segmentation Networks (LSN) that jointly model epistemic (model) and aleatoric (data) uncertainty in image segmentation. We capture data uncertainty with a spatially correlated logit distribution. For model uncertainty, we propose the first Laplace approximation of the weight posterior that scales to large neural networks with skip connections that have high-dimensional outputs. Empirically, we demonstrate that modelling spatial pixel correlation allows the Laplacian Segmentation Network to successfully assign high epistemic uncertainty to out-of-distribution objects appearing within images.
分布外的医疗图像经常遇到,例如由于站点或扫描仪差异,或者图像损坏等原因。分布外图像有可能导致图像分割不正确,可能对该后续诊断或治疗产生负面影响。为了确保对不正确分割的鲁棒性,我们提出了拉普拉斯分割网络(LSN),该网络同时建模图像分割中的知识(模型)和 aleatoric(数据)不确定性。我们使用空间相关logit分布来捕捉数据不确定性。对于模型不确定性,我们提出了拉普拉斯后估计权重的第一项近似,该近似可以扩展到具有高维输出的 skip 连接的大型神经网络。经验上,我们证明建模空间像素相关性可以让拉普拉斯分割网络成功地将分布外物体在图像中出现的知识不确定性分配给它们。
https://arxiv.org/abs/2303.13123
The advent of Vision Transformer (ViT) has brought substantial advancements in 3D volumetric benchmarks, particularly in 3D medical image segmentation. Concurrently, Multi-Layer Perceptron (MLP) networks have regained popularity among researchers due to their comparable results to ViT, albeit with the exclusion of the heavy self-attention module. This paper introduces a permutable hybrid network for volumetric medical image segmentation, named PHNet, which exploits the advantages of convolution neural network (CNN) and MLP. PHNet addresses the intrinsic isotropy problem of 3D volumetric data by utilizing both 2D and 3D CNN to extract local information. Besides, we propose an efficient Multi-Layer Permute Perceptron module, named MLPP, which enhances the original MLP by obtaining long-range dependence while retaining positional information. Extensive experimental results validate that PHNet outperforms the state-of-the-art methods on two public datasets, namely, COVID-19-20 and Synapse. Moreover, the ablation study demonstrates the effectiveness of PHNet in harnessing the strengths of both CNN and MLP. The code will be accessible to the public upon acceptance.
视觉变换器(ViT)的出现已经在3D体积基准方面取得了显著进展,特别是在3D医学图像分割方面。同时,多核神经网络(MLP)网络重新获得了研究人员的青睐,尽管它们的结果与ViT相当,但排除了重定向器(self-attention module)。本文介绍了一种可转换的混合网络,名为PHNet,它利用卷积神经网络(CNN)和MLP的优点。PHNet解决了3D体积数据的内在同向性问题,通过利用2D和3DCNN提取局部信息。此外,我们提出了一种高效的多核重定向器模块,名为MLPP,通过获得长距离依赖而保留位置信息,增强了原始的MLP。广泛的实验结果证实,PHNet在两个公共数据集上优于最先进的方法,即COVID-19-20和SYNapse。此外, ablation研究证明了PHNet在利用CNN和MLP的优点方面的有效性。代码将在接受后向公众开放。
https://arxiv.org/abs/2303.13111
Cell detection is a fundamental task in computational pathology that can be used for extracting high-level medical information from whole-slide images. For accurate cell detection, pathologists often zoom out to understand the tissue-level structures and zoom in to classify cells based on their morphology and the surrounding context. However, there is a lack of efforts to reflect such behaviors by pathologists in the cell detection models, mainly due to the lack of datasets containing both cell and tissue annotations with overlapping regions. To overcome this limitation, we propose and publicly release OCELOT, a dataset purposely dedicated to the study of cell-tissue relationships for cell detection in histopathology. OCELOT provides overlapping cell and tissue annotations on images acquired from multiple organs. Within this setting, we also propose multi-task learning approaches that benefit from learning both cell and tissue tasks simultaneously. When compared against a model trained only for the cell detection task, our proposed approaches improve cell detection performance on 3 datasets: proposed OCELOT, public TIGER, and internal CARP datasets. On the OCELOT test set in particular, we show up to 6.79 improvement in F1-score. We believe the contributions of this paper, including the release of the OCELOT dataset at this https URL are a crucial starting point toward the important research direction of incorporating cell-tissue relationships in computation pathology.
细胞检测是计算病理学中的基本概念任务,可以用于从整张切片图像中提取高级别的医疗信息。对于准确的细胞检测,病理学家通常放大以理解组织级结构,并放大以根据细胞的形态和周围环境进行分类。然而,缺乏在细胞检测模型中体现病理学家行为的努力,主要是因为缺乏包含细胞和组织注释有重叠区域的dataset。为了克服这一限制,我们提议并公开发布OCELOT,这是一个专门用于研究细胞-组织关系的研究dataset。OCELOT提供从多个器官获取的重叠细胞和组织注释的图像。在此情况下,我们也提出了多任务学习方法,可以从同时学习细胞和组织任务中受益匪浅。与仅训练用于细胞检测任务模型相比,我们提出的方法在3个dataset上提高了细胞检测性能:提议的OCELOT、公共鲸鱼和内部CARPdataset。在OCELOT测试集上,我们表现出高达6.79的提高F1得分。我们相信本文的贡献,包括在此httpsURL上的发布OCELOTdataset,是计算病理学中融入细胞-组织关系的重要研究方向的关键起点。
https://arxiv.org/abs/2303.13110
Recent trends in semi-supervised learning have significantly boosted the performance of 3D semi-supervised medical image segmentation. Compared with 2D images, 3D medical volumes involve information from different directions, e.g., transverse, sagittal, and coronal planes, so as to naturally provide complementary views. These complementary views and the intrinsic similarity among adjacent 3D slices inspire us to develop a novel annotation way and its corresponding semi-supervised model for effective segmentation. Specifically, we firstly propose the orthogonal annotation by only labeling two orthogonal slices in a labeled volume, which significantly relieves the burden of annotation. Then, we perform registration to obtain the initial pseudo labels for sparsely labeled volumes. Subsequently, by introducing unlabeled volumes, we propose a dual-network paradigm named Dense-Sparse Co-training (DeSCO) that exploits dense pseudo labels in early stage and sparse labels in later stage and meanwhile forces consistent output of two networks. Experimental results on three benchmark datasets validated our effectiveness in performance and efficiency in annotation. For example, with only 10 annotated slices, our method reaches a Dice up to 86.93% on KiTS19 dataset.
最近的趋势是半监督学习,这极大地提高了3D半监督医学图像分割的性能。相比2D图像,3D医疗体积从不同方向涉及信息,例如横断、 sagittal和 coronal平面,以自然提供互补观点。这些互补观点和相邻3D切片之间的内在相似性启发我们开发一种新的标注方式和相应的半监督模型,以有效地分割。具体来说,我们首先提出了垂直标注,仅在每个标记体积中标记两个垂直切片,从而显著减轻了标注的负担。随后,我们进行注册以获取较少标记的初始伪标签。随后,通过引入未标记体积,我们提出了一种名为Dense-Sparse Co-training(DeSCO)的双网络范式,该范式在早期利用密集伪标签,而在后期利用稀疏标签,同时强制两个网络的一致性输出。对三个基准数据集的实验结果验证了我们的性能效率和标注的有效性。例如,仅使用10个标注切片,我们的方法在Kinets19数据集上达到Dice高达86.93%。
https://arxiv.org/abs/2303.13090
Electronic health records (EHRs) store an extensive array of patient information, encompassing medical histories, diagnoses, treatments, and test outcomes. These records are crucial for enabling healthcare providers to make well-informed decisions regarding patient care. Summarizing clinical notes further assists healthcare professionals in pinpointing potential health risks and making better-informed decisions. This process contributes to reducing errors and enhancing patient outcomes by ensuring providers have access to the most pertinent and current patient data. Recent research has shown that incorporating prompts with large language models (LLMs) substantially boosts the efficacy of summarization tasks. However, we show that this approach also leads to increased output variance, resulting in notably divergent outputs even when prompts share similar meanings. To tackle this challenge, we introduce a model-agnostic Soft Prompt-Based Calibration (SPeC) pipeline that employs soft prompts to diminish variance while preserving the advantages of prompt-based summarization. Experimental findings on multiple clinical note tasks and LLMs indicate that our method not only bolsters performance but also effectively curbs variance for various LLMs, providing a more uniform and dependable solution for summarizing vital medical information.
电子健康记录(EHRs)存储了广泛的患者信息,包括医疗历史、诊断、治疗和测试结果。这些记录对于使医疗保健提供者做出关于护理的知情决策至关重要。总结临床笔记进一步协助医疗保健专业人员指出潜在的健康风险并做出更好的知情决策。这个过程有助于减少错误并增强患者的治疗效果,通过确保提供者访问最相关和最新的患者数据来实现。最近的研究表明,包括大型语言模型(LLMs)的提示极大地提高了摘要任务的有效性。然而,我们表明,这种方法也导致输出变异性增加,即使提示共享相似的含义,仍会导致显著的不同输出。为了应对这个挑战,我们引入了一种模型无关的软提示-基于校准(SPeC)管道,采用软提示以减少变异性,同时保留提示摘要的优势。对多个临床笔记任务和LLM的实验室发现表明,我们的方法不仅增强了性能,而且有效地限制了各种LLM的输出变异性,提供了一种更均匀和可靠的摘要重要医疗信息的解决方案。
https://arxiv.org/abs/2303.13035
The successes of foundation models such as ChatGPT and AlphaFold have spurred significant interest in building similar models for electronic medical records (EMRs) to improve patient care and hospital operations. However, recent hype has obscured critical gaps in our understanding of these models' capabilities. We review over 80 foundation models trained on non-imaging EMR data (i.e. clinical text and/or structured data) and create a taxonomy delineating their architectures, training data, and potential use cases. We find that most models are trained on small, narrowly-scoped clinical datasets (e.g. MIMIC-III) or broad, public biomedical corpora (e.g. PubMed) and are evaluated on tasks that do not provide meaningful insights on their usefulness to health systems. In light of these findings, we propose an improved evaluation framework for measuring the benefits of clinical foundation models that is more closely grounded to metrics that matter in healthcare.
像 ChatGPT 和 AlphaFold 等基因为改善患者护理和医院运营而引起了巨大的兴趣,但是最近的繁荣掩盖了我们对这些模型能力的关键差距的理解。我们回顾了超过 80 个基于非成像 EMR 数据(即临床文本和/或结构化数据)的训练基因为建立类似模型的目标,并建立了分类器,描述了它们的架构、训练数据和潜在用途。我们发现,大多数模型都训练在小型、狭隘的临床试验数据(如 MIMIC-III)或广泛的公共生物医学库(如 PubMed)上,并且在评估任务中无法提供对它们对医疗系统有用性的有意义 insights。基于这些发现,我们提出了一个改进的评估框架,用于测量临床基因为改善医疗系统所带来好处,更加接近在医疗保健中重要的指标。
https://arxiv.org/abs/2303.12961
Surgical scene understanding is a key prerequisite for contextaware decision support in the operating room. While deep learning-based approaches have already reached or even surpassed human performance in various fields, the task of surgical action recognition remains a major challenge. With this contribution, we are the first to investigate the concept of self-distillation as a means of addressing class imbalance and potential label ambiguity in surgical video analysis. Our proposed method is a heterogeneous ensemble of three models that use Swin Transfomers as backbone and the concepts of self-distillation and multi-task learning as core design choices. According to ablation studies performed with the CholecT45 challenge data via cross-validation, the biggest performance boost is achieved by the usage of soft labels obtained by self-distillation. External validation of our method on an independent test set was achieved by providing a Docker container of our inference model to the challenge organizers. According to their analysis, our method outperforms all other solutions submitted to the latest challenge in the field. Our approach thus shows the potential of self-distillation for becoming an important tool in medical image analysis applications.
surgical scene understanding是意识流决策支持在手术房中的关键前提。尽管基于深度学习的方法已经在各种领域中达到了或甚至超过了人类的表现,但识别手术动作仍然是一个 major 的挑战。通过这项工作,我们是第一位研究自我蒸馏概念的,将其作为解决手术视频分析中类别不平衡和潜在标签歧义的手段。我们提出的方法是由三个模型组成的异质组合,其中使用 Swin 流体层作为主干,自我蒸馏和多任务学习作为核心设计选择。根据对 CholecT45 挑战数据进行交叉验证的研究,最大的性能提升是通过使用自我蒸馏的软标签实现的。通过向挑战组织者提供我们的推理模型的 Docker 容器,实现了对独立测试集的外部验证。根据他们的分析,我们的方法在该领域的最新挑战中表现优于所有其他解决方案。我们的方法因此展示了自我蒸馏在医学图像分析应用中成为重要工具的潜力。
https://arxiv.org/abs/2303.12915
Electronic medical records (EMRs) are stored in relational databases. It can be challenging to access the required information if the user is unfamiliar with the database schema or general database fundamentals. Hence, researchers have explored text-to-SQL generation methods that provide healthcare professionals direct access to EMR data without needing a database expert. However, currently available datasets have been essentially "solved" with state-of-the-art models achieving accuracy greater than or near 90%. In this paper, we show that there is still a long way to go before solving text-to-SQL generation in the medical domain. To show this, we create new splits of the existing medical text-to-SQL dataset MIMICSQL that better measure the generalizability of the resulting models. We evaluate state-of-the-art language models on our new split showing substantial drops in performance with accuracy dropping from up to 92% to 28%, thus showing substantial room for improvement. Moreover, we introduce a novel data augmentation approach to improve the generalizability of the language models. Overall, this paper is the first step towards developing more robust text-to-SQL models in the medical domain.\footnote{The dataset and code will be released upon acceptance.
电子医疗记录(EMRs)存储在关系型数据库中。如果用户不熟悉数据库表 schema或一般数据库基础结构,那么访问所需的信息可能会非常困难。因此,研究人员已经探索了文本到SQL生成方法,以便提供医疗保健专业人员直接访问EMR数据,而不需要数据库专家。然而,目前可用的数据集基本上已经“解决”,最先进的模型准确率超过或接近于90%。在本文中,我们表明,在医疗领域中解决文本到SQL生成问题还有很长的路要走。为了展示这一点,我们创造了新的医疗文本到SQL数据集MIMICSQL的分集,更好地衡量结果模型的通用性。我们评估了最先进的语言模型在我们的新分集中的表现,显示性能大幅度下降,准确率从高达92%降至28%,因此表明有很大的改进空间。此外,我们引入了一种新的数据增强方法,以提高语言模型的通用性。总的来说,本文是开发医疗领域中更稳定的文本到SQL模型的第一步。
https://arxiv.org/abs/2303.12898
Placing a human in the loop may abate the risks of deploying AI systems in safety-critical settings (e.g., a clinician working with a medical AI system). However, mitigating risks arising from human error and uncertainty within such human-AI interactions is an important and understudied issue. In this work, we study human uncertainty in the context of concept-based models, a family of AI systems that enable human feedback via concept interventions where an expert intervenes on human-interpretable concepts relevant to the task. Prior work in this space often assumes that humans are oracles who are always certain and correct. Yet, real-world decision-making by humans is prone to occasional mistakes and uncertainty. We study how existing concept-based models deal with uncertain interventions from humans using two novel datasets: UMNIST, a visual dataset with controlled simulated uncertainty based on the MNIST dataset, and CUB-S, a relabeling of the popular CUB concept dataset with rich, densely-annotated soft labels from humans. We show that training with uncertain concept labels may help mitigate weaknesses of concept-based systems when handling uncertain interventions. These results allow us to identify several open challenges, which we argue can be tackled through future multidisciplinary research on building interactive uncertainty-aware systems. To facilitate further research, we release a new elicitation platform, UElic, to collect uncertain feedback from humans in collaborative prediction tasks.
将人类纳入循环可能会减轻在安全关键环境中部署AI系统的风险(例如,一名临床医生与医疗AI系统工作的人)。然而,在这类人类-AI交互中减轻由人类错误和不确定性引起的风险是一个重要但尚未深入研究的问题。在本文中,我们研究了基于概念模型的人类不确定性,这是一个由专家通过概念干预提供人类反馈的AI系统家族。先前在这个领域的工作通常假设人类是可靠的预言者,总是准确无误。然而,人类在现实生活中的决策往往偶尔有误和不确定性。我们使用两个新的数据集研究了现有概念模型如何应对来自人类的不确定干预:UMNIST是一个基于MNIST数据集的视觉数据集,CUB-S是对广受赞誉的CUB概念数据集的重命名,其中从人类提供的丰富、密集注释软标签进行了重新分类。我们表明,训练带有不确定概念标签的数据可以帮助减轻概念模型的漏洞,这些结果使我们能够识别几个开放挑战,我们认为可以通过未来跨学科研究建立交互不确定性意识的系统来解决。为了促进进一步研究,我们发布了一个新的收集人类不确定反馈的平台UElic,用于协作预测任务。
https://arxiv.org/abs/2303.12872
Generalization capabilities of learning-based medical image segmentation across domains are currently limited by the performance degradation caused by the domain shift, particularly for ultrasound (US) imaging. The quality of US images heavily relies on carefully tuned acoustic parameters, which vary across sonographers, machines, and settings. To improve the generalizability on US images across domains, we propose MI-SegNet, a novel mutual information (MI) based framework to explicitly disentangle the anatomical and domain feature representations; therefore, robust domain-independent segmentation can be expected. Two encoders are employed to extract the relevant features for the disentanglement. The segmentation only uses the anatomical feature map for its prediction. In order to force the encoders to learn meaningful feature representations a cross-reconstruction method is used during training. Transformations, specific to either domain or anatomy are applied to guide the encoders in their respective feature extraction task. Additionally, any MI present in both feature maps is punished to further promote separate feature spaces. We validate the generalizability of the proposed domain-independent segmentation approach on several datasets with varying parameters and machines. Furthermore, we demonstrate the effectiveness of the proposed MI-SegNet serving as a pre-trained model by comparing it with state-of-the-art networks.
学习型医学图像分割在不同领域的泛化能力目前受到域 shift 的影响,特别是对于超声波 (US) 成像的影响。超声波图像的质量很大程度上取决于精心调整的声学参数,这些参数在不同医生、设备和设置中有所不同。为了改善在不同领域的超声波图像上的泛化能力,我们提出了 MI-SegNet,一种基于互信息 (MI) 的新框架,以明确分离解剖学和域特征表示;因此,具有域独立的分割能力是可以期望的。我们使用两个编码器来提取相关的特征以进行分离。分割仅使用解剖学特征映射来进行预测。为了强迫编码器学习有意义的特征表示,在训练期间使用交叉重构方法。对于不同域或解剖学的特定变换,进行了应用,以指导编码器各自的特征提取任务。此外,在两个特征映射中都存在 MI 时,对其进行惩罚,以进一步促进独立的特征空间。我们验证了所提出的具有域独立性分割方法在不同参数和机器上的泛化能力,并比较了当前最先进的网络。我们还证明了所提出的 MI-SegNet 作为预训练模型的有效性,通过与当前最先进的网络进行比较。
https://arxiv.org/abs/2303.12649
State-of-the-art machine learning models often learn spurious correlations embedded in the training data. This poses risks when deploying these models for high-stake decision-making, such as in medical applications like skin cancer detection. To tackle this problem, we propose Reveal to Revise (R2R), a framework entailing the entire eXplainable Artificial Intelligence (XAI) life cycle, enabling practitioners to iteratively identify, mitigate, and (re-)evaluate spurious model behavior with a minimal amount of human interaction. In the first step (1), R2R reveals model weaknesses by finding outliers in attributions or through inspection of latent concepts learned by the model. Secondly (2), the responsible artifacts are detected and spatially localized in the input data, which is then leveraged to (3) revise the model behavior. Concretely, we apply the methods of RRR, CDEP and ClArC for model correction, and (4) (re-)evaluate the model's performance and remaining sensitivity towards the artifact. Using two medical benchmark datasets for Melanoma detection and bone age estimation, we apply our R2R framework to VGG, ResNet and EfficientNet architectures and thereby reveal and correct real dataset-intrinsic artifacts, as well as synthetic variants in a controlled setting. Completing the XAI life cycle, we demonstrate multiple R2R iterations to mitigate different biases. Code is available on this https URL.
最先进的机器学习模型常常在训练数据中学习伪相关性。当将这些模型用于重要决策时,例如医疗应用如皮肤癌检测时,这就构成了风险。为了解决这一问题,我们提出了“揭示以更新”(R2R)框架,该框架涵盖了整个可解释人工智能(XAI)生命周期,使用户可以迭代地识别、减轻和(再次)评估伪模型行为,而只需要少量的人类交互。在第一个步骤(1),R2R通过发现归因异常值或检查模型学习的潜在概念来揭示模型的弱点。在第二个步骤(2),负责的元数据被检测并空间定位在输入数据中,然后利用它来(再次)更新模型行为。具体来说,我们应用RRR、CDEP和ClArC方法来进行模型修正,并(再次)评估模型的性能和剩余的对元数据敏感性。使用两个医疗基准数据集来检测和估计 Melanoma 检测和骨龄估计,我们应用我们的 R2R 框架到 VGG、ResNet 和 EfficientNet 架构上,从而揭示和纠正真实的数据源固有元数据,以及在控制环境下的人造变异体。完成 XAI 生命周期后,我们展示了多个 R2R 迭代以减轻不同的偏见。代码在此 https URL 可用。
https://arxiv.org/abs/2303.12641