Recent studies have made significant progress in developing large language models (LLMs) in the medical domain, which can answer expert-level questions and demonstrate the potential to assist clinicians in real-world clinical scenarios. Studies have also witnessed the importance of integrating various modalities with the existing LLMs for a better understanding of complex clinical contexts, which are innately multi-faceted by nature. Although studies have demonstrated the ability of multimodal LLMs in histopathology to answer questions from given images, they lack in understanding of thorough clinical context due to the patch-level data with limited information from public datasets. Thus, developing WSI-level MLLMs is significant in terms of the scalability and applicability of MLLMs in histopathology. In this study, we introduce an expert-level MLLM for histopathology using WSIs, dubbed as ChatEXAONEPath. We present a retrieval-based data generation pipeline using 10,094 pairs of WSIs and histopathology reports from The Cancer Genome Atlas (TCGA). We also showcase an AI-based evaluation protocol for a comprehensive understanding of the medical context from given multimodal information and evaluate generated answers compared to the original histopathology reports. We demonstrate the ability of diagnosing the given histopathology images using ChatEXAONEPath with the acceptance rate of 62.9% from 1,134 pairs of WSIs and reports. Our proposed model can understand pan-cancer WSIs and clinical context from various cancer types. We argue that our proposed model has the potential to assist clinicians by comprehensively understanding complex morphology of WSIs for cancer diagnosis through the integration of multiple modalities.
最近的研究在医学领域的大规模语言模型(LLMs)开发方面取得了显著进展,这些模型能够回答专家级别的问题,并展示了协助临床医生处理实际临床场景的潜力。研究还见证了将各种模态与现有LLMs整合的重要性,以更好地理解复杂的临床环境,这些环境本质上是多方面的。尽管已有研究表明多模态LLMs在组织病理学领域能够根据给定图像回答问题,但由于来自公共数据集的补丁级别数据信息有限,它们在全面了解临床背景方面存在不足。因此,在组织病理学中开发基于全切片图像(WSI)级别的多模态语言模型具有重要意义,以提高其可扩展性和适用性。 在这项研究中,我们引入了一种专家级多模态语言模型用于组织病理学的开发,名为ChatEXAONEPath,并使用来自癌症基因组图谱(TCGA)的数据生成了一个基于检索的方法的数据生成管道,其中包括10,094对全切片图像和组织病理报告。此外,还展示了一种基于人工智能的评估协议,用以从给定的多模态信息中全面理解医学背景,并将生成的答案与原始的组织病理学报告进行比较。我们展示了使用ChatEXAONEPath诊断给定组织病理图像的能力,在1,134对全切片图像和报告中接受了62.9%的结果。 我们的模型能够理解和处理跨癌种的WSI以及来自各种癌症类型的临床背景,通过多模态整合全面理解复杂的形态学特征。我们认为,所提出的模型具有潜在能力,可以通过综合理解复杂形态来协助临床医生进行癌症诊断。
https://arxiv.org/abs/2504.13023
Deep learning-based point cloud modeling has been widely investigated as an indispensable component of general shape analysis. Recently, transformer and state space model (SSM) have shown promising capacities in point cloud learning. However, limited research has been conducted on medical point clouds, which have great potential in disease diagnosis and treatment. This paper presents an SSM-based hierarchical feature learning framework for medical point cloud understanding. Specifically, we down-sample input into multiple levels through the farthest point sampling. At each level, we perform a series of k-nearest neighbor (KNN) queries to aggregate multi-scale structural information. To assist SSM in processing point clouds, we introduce coordinate-order and inside-out scanning strategies for efficient serialization of irregular points. Point features are calculated progressively from short neighbor sequences and long point sequences through vanilla and group Point SSM blocks, to capture both local patterns and long-range dependencies. To evaluate the proposed method, we build a large-scale medical point cloud dataset named MedPointS for anatomy classification, completion, and segmentation. Extensive experiments conducted on MedPointS demonstrate that our method achieves superior performance across all tasks. The dataset is available at this https URL. Code is merged to a public medical imaging platform: this https URL.
基于深度学习的点云建模已被广泛研究,作为通用形状分析的一个不可或缺的组成部分。最近,变压器和状态空间模型(SSM)在点云学习中展现了巨大的潜力。然而,在医疗点云领域的研究相对较少,但这些数据在疾病诊断和治疗方面具有巨大潜力。本文提出了一种基于SSM的层次化特征学习框架,用于医学点云的理解。具体而言,我们通过最远点采样将输入下采样为多个级别。在每个级别上,我们进行一系列k-最近邻(KNN)查询以聚合多尺度结构信息。为了帮助SSM处理点云,我们引入了坐标顺序和内外扫描策略,以实现不规则点的有效序列化。通过普通的和分组的Point SSM块逐步计算从短邻居序列到长点序列的点特征,以此来捕捉局部模式及远程依赖关系。 为评估所提出的方法,我们构建了一个大规模医学点云数据集MedPointS,用于解剖分类、补全和分割任务。在MedPointS上进行的大量实验表明,在所有任务中我们的方法均取得了优异的成绩。该数据集可在此网址获取:[https URL]。代码合并到了一个公共医疗影像平台:此[https URL]。
https://arxiv.org/abs/2504.13015
Contemporary digital technology has a pivotal role in the design of customized medical appliances, including occlusal splints used in the treatment of stomatognathic system dysfunctions. We present an approach to computer-aided design and precision assessment of positioning occlusal splints, bridging clinical concepts with current digital dental practice. In our model, a 3D splint is generated based on a transformation matrix that represents the therapeutic change in mandibular position, defined by a specialist using a virtual patient model reconstructed from intraoral scans, CBCT, 3D facial scans and plaster model digitisation. The paper introduces a novel method for generating splints that accurately reproduce occlusal conditions in the therapeutic position, including a mechanism for resolving surface conflicts through virtual embossing. We demonstrate how transformation matrices can be acquired through clinical tools and intraoral devices, and evaluate the accuracy of the designed and printed splints using profile and surface deviation analysis. The proposed method enables reproducible, patient-specific splint fabrication and opens new possibilities in diagnostics, multimodal image registration and quantification of occlusal discrepancies.
当代数字技术在定制医疗设备的设计中扮演着关键角色,包括用于治疗咀嚼系统功能障碍的咬合板。本文提出了一种计算机辅助设计和定位咬合板精确定位的方法,将临床概念与当前数字化牙科实践相结合。我们的模型基于一个转换矩阵生成三维咬合板,该矩阵表示下颌位置治疗性变化,由专科医生使用从口腔内扫描、CBCT(锥形束计算机断层扫描)、3D面部扫描和石膏模型数字重建的虚拟患者模型定义。 本文介绍了一种新的方法来生成准确再现治疗位置咬合条件的咬合板,包括通过虚拟压花解决表面冲突的机制。我们展示了如何通过临床工具和口腔内设备获得转换矩阵,并使用轮廓偏差分析和表面偏差分析评估设计和打印出的咬合板的准确性。 所提出的方法使得可以重复制造患者特异性的咬合板,并为诊断、多模式图像配准以及咬合差异量化开辟了新的可能性。
https://arxiv.org/abs/2504.12868
Pap smear image segmentation is crucial for cervical cancer diagnosis. However, traditional segmentation models often struggle with complex cellular structures and variations in pap smear images. This study proposes a hybrid Dense-UNet201 optimization approach that integrates a pretrained DenseNet201 as the encoder for the U-Net architecture and optimizes it using the spider monkey optimization (SMO) algorithm. The Dense-UNet201 model excelled at feature extraction. The SMO was modified to handle categorical and discrete parameters. The SIPaKMeD dataset was used in this study and evaluated using key performance metrics, including loss, accuracy, Intersection over Union (IoU), and Dice coefficient. The experimental results showed that Dense-UNet201 outperformed U-Net, Res-UNet50, and Efficient-UNetB0. SMO Dense-UNet201 achieved a segmentation accuracy of 96.16%, an IoU of 91.63%, and a Dice coefficient score of 95.63%. These findings underscore the effectiveness of image preprocessing, pretrained models, and metaheuristic optimization in improving medical image analysis and provide new insights into cervical cell segmentation methods.
宫颈癌诊断中,Pap涂片图像分割至关重要。然而,传统的分割模型在处理复杂的细胞结构和Pap涂片图像中的变化时往往面临挑战。本研究提出了一种混合Dense-UNet201优化方法,该方法将预训练的DenseNet201作为U-Net架构的编码器,并使用蜘蛛猴优化(SMO)算法进行优化。Dense-UNet201模型在特征提取方面表现出色。同时对SMO进行了修改以处理分类和离散参数。本研究使用了SIPaKMeD数据集,并通过关键性能指标,包括损失、准确率、交并比(IoU)和Dice系数进行评估。实验结果显示,Dense-UNet201在图像分割中优于U-Net、Res-UNet50和Efficient-UNetB0模型。SMO优化后的Dense-UNet201实现了96.16%的分割准确率、91.63%的IoU值以及95.63%的Dice系数得分。这些发现强调了图像预处理、预训练模型和元启发式优化在提高医学影像分析中的有效性,并为宫颈细胞分割方法提供了新的见解。
https://arxiv.org/abs/2504.12807
Chinese-Vicuna is an open-source, resource-efficient language model designed to bridge the gap in Chinese instruction-following capabilities by fine-tuning Meta's LLaMA architecture using Low-Rank Adaptation (LoRA). Targeting low-resource environments, it enables cost-effective deployment on consumer GPUs (e.g., RTX-2080Ti for 7B models) and supports domain-specific adaptation in fields like healthcare and law. By integrating hybrid datasets (BELLE and Guanaco) and 4-bit quantization (QLoRA), the model achieves competitive performance in tasks such as translation, code generation, and domain-specific Q\&A. The project provides a comprehensive toolkit for model conversion, CPU inference, and multi-turn dialogue interfaces, emphasizing accessibility for researchers and developers. Evaluations indicate competitive performance across medical tasks, multi-turn dialogue coherence, and real-time legal updates. Chinese-Vicuna's modular design, open-source ecosystem, and community-driven enhancements position it as a versatile foundation for Chinese LLM applications.
Chinese-Vicuna 是一个开源、资源高效的语言模型,旨在通过使用低秩适应(LoRA)技术对 Meta 的 LLaMA 架构进行微调,来弥补中文指令跟随能力的不足。它针对计算资源有限的环境而设计,可以在消费级 GPU(例如 RTX-2080Ti 上运行 7B 模型)上以低成本部署,并支持医疗和法律等领域的特定领域适应性。 通过整合混合数据集(如 BELLE 和 Guanaco)以及采用4位量化(QLoRA),该模型在诸如翻译、代码生成及特定领域的问答任务中表现出竞争力的性能。该项目提供了一整套工具包,涵盖模型转换、CPU 推断和多轮对话接口等功能,旨在为研究人员和开发人员提供高度可访问性。 评估结果表明,Chinese-Vicuna 在医疗任务、多轮对话连贯性和实时法律更新等方面都达到了竞争性的表现水平。凭借模块化设计、开源生态系统及社区驱动的增强功能,Chinese-Vicuna 作为中文大型语言模型应用的基础平台而具备极高的灵活性和适用性。
https://arxiv.org/abs/2504.12737
This paper introduces AdaptoVision, a novel convolutional neural network (CNN) architecture designed to efficiently balance computational complexity and classification accuracy. By leveraging enhanced residual units, depth-wise separable convolutions, and hierarchical skip connections, AdaptoVision significantly reduces parameter count and computational requirements while preserving competitive performance across various benchmark and medical image datasets. Extensive experimentation demonstrates that AdaptoVision achieves state-of-the-art on BreakHis dataset and comparable accuracy levels, notably 95.3\% on CIFAR-10 and 85.77\% on CIFAR-100, without relying on any pretrained weights. The model's streamlined architecture and strategic simplifications promote effective feature extraction and robust generalization, making it particularly suitable for deployment in real-time and resource-constrained environments.
本文介绍了AdaptoVision,这是一种新颖的卷积神经网络(CNN)架构,旨在高效地平衡计算复杂性和分类准确性。通过利用增强型残差单元、深度可分离卷积和分层跳跃连接,AdaptoVision显著减少了参数数量和计算需求,同时在各种基准数据集和医学图像数据集中保持了竞争性的性能水平。广泛的实验表明,在不依赖任何预训练权重的情况下,AdaptoVision在BreakHis数据集上达到了最先进的水平,并且在CIFAR-10数据集上的准确率高达95.3%,而在CIFAR-100数据集上的准确率为85.77%。该模型的精简架构和战略性简化促进了有效的特征提取和强大的泛化能力,使其特别适合部署于实时和资源受限环境中。
https://arxiv.org/abs/2504.12652
Labeling has always been expensive in the medical context, which has hindered related deep learning application. Our work introduces active learning in surgical video frame selection to construct a high-quality, affordable Laparoscopic Cholecystectomy dataset for semantic segmentation. Active learning allows the Deep Neural Networks (DNNs) learning pipeline to include the dataset construction workflow, which means DNNs trained by existing dataset will identify the most informative data from the newly collected data. At the same time, DNNs' performance and generalization ability improve over time when the newly selected and annotated data are included in the training data. We assessed different data informativeness measurements and found the deep features distances select the most informative data in this task. Our experiments show that with half of the data selected by active learning, the DNNs achieve almost the same performance with 0.4349 mean Intersection over Union (mIoU) compared to the same DNNs trained on the full dataset (0.4374 mIoU) on the critical anatomies and surgical instruments.
在医学领域,标注工作一直成本高昂,这阻碍了相关深度学习应用的发展。我们的研究通过引入主动学习技术来选择手术视频中的关键帧,旨在构建一个高质量且经济的腹腔镜胆囊切除术数据集,用于语义分割任务。 主动学习使深度神经网络(DNN)的学习流程能够融入数据集构建过程。这意味着使用现有数据训练的DNN可以从新收集的数据中识别出最具信息量的部分。同时,当这些被选择并标注的新数据加入到训练集中时,DNN的表现和泛化能力会随着时间逐步提升。 我们在实验中评估了不同的数据信息度量方法,并发现基于深度特征距离的方法能够在此任务中最有效地挑选出最有信息量的数据。我们的实验证明,在使用主动学习选出一半数据的情况下,DNN的性能与完全数据集训练的结果几乎相同(关键解剖结构和手术器械在mIoU指标上分别为0.4349和0.4374)。这表明我们的方法有效地降低了构建高质量医学图像分割数据集的成本。
https://arxiv.org/abs/2504.12573
The elliptical shape prior information plays a vital role in improving the accuracy of image segmentation for specific tasks in medical and natural images. Existing deep learning-based segmentation methods, including the Segment Anything Model (SAM), often struggle to produce segmentation results with elliptical shapes efficiently. This paper proposes a new approach to integrate the prior of elliptical shapes into the deep learning-based SAM image segmentation techniques using variational methods. The proposed method establishes a parameterized elliptical contour field, which constrains the segmentation results to align with predefined elliptical contours. Utilizing the dual algorithm, the model seamlessly integrates image features with elliptical priors and spatial regularization priors, thereby greatly enhancing segmentation accuracy. By decomposing SAM into four mathematical sub-problems, we integrate the variational ellipse prior to design a new SAM network structure, ensuring that the segmentation output of SAM consists of elliptical regions. Experimental results on some specific image datasets demonstrate an improvement over the original SAM.
椭圆形的先验信息在医学和自然图像特定任务中的图像分割准确性提升方面起着关键作用。现有的基于深度学习的分割方法,包括“Segment Anything Model(SAM)”,往往难以有效地生成具有椭圆形状的分割结果。本文提出了一种新方法,通过变分法将椭圆形的先验信息整合到基于深度学习的SAM图像分割技术中。所提出的方法建立了一个参数化的椭圆轮廓场,限制了分割结果与预定义的椭圆轮廓对齐。利用双算法,该模型可以无缝地融合图像特征和椭圆先验及空间正则化先验,从而大幅提高分割精度。通过将SAM分解为四个数学子问题,我们将变分椭圆先验整合到新的SAM网络结构设计中,确保了SAM的分割输出由椭圆形区域组成。在某些特定图像数据集上的实验结果表明,相较于原始的SAM方法,本研究的方法有所改进。
https://arxiv.org/abs/2504.12556
Objective: Create precise, structured, data-backed guidelines for type 2 diabetes treatment progression, suitable for clinical adoption. Research Design and Methods: Our training cohort was composed of patient (with type 2 diabetes) visits from Boston Medical Center (BMC) from 1998 to 2014. We divide visits into 4 groups based on the patient's treatment regimen before the visit, and further divide them into subgroups based on the recommended treatment during the visit. Since each subgroup has observational data, which has confounding bias (sicker patients are prescribed more aggressive treatments), we used machine learning and optimization to remove some datapoints so that the remaining data resembles a randomized trial. On each subgroup, we train AI-backed tree-based models to prescribe treatment changes. Once we train these tree models, we manually combine the models for every group to create an end-to-end prescription pipeline for all patients in that group. In this process, we prioritize stepping up to a more aggressive treatment before considering less aggressive options. We tested this pipeline on unseen data from BMC, and an external dataset from Hartford healthcare (type 2 diabetes patient visits from January 2020 to May 2024). Results: The median HbA1c reduction achieved by our pipelines is 0.26% more than what the doctors achieved on the unseen BMC patients. For the Hartford cohort, our pipelines were better by 0.13%. Conclusions: This precise, interpretable, and efficient AI-backed approach to treatment progression in type 2 diabetes is predicted to outperform the current practice and can be deployed to improve patient outcomes.
目标:创建一套基于数据的、精确且结构化的二型糖尿病治疗进展指南,适用于临床应用。 研究设计与方法: 我们的训练队列由1998年至2014年间波士顿医疗中心(BMC)的患者(患有二型糖尿病)访问记录组成。我们将这些访问分为四组,依据患者在就诊前接受的治疗方案进行划分,并根据就诊时推荐的治疗方法进一步细分成子组。由于每个子组都包含观察数据,可能存在混杂偏倚(病情较重的患者通常会获得更为激进的治疗),我们采用机器学习和优化技术移除部分数据点,使剩余的数据看起来像是随机试验的结果。在每个子组中,我们训练基于AI的支持树模型来推荐治疗调整方案。一旦训练完成这些树形模型后,我们将手动合并各组中的所有模型,创建适用于该组所有患者的端到端处方流程,在此过程中优先考虑逐步过渡至更为激进的治疗方法而非较温和的选择。我们在未见过的数据集上测试了这一管道,包括BMC内部的未观察数据和外部哈特福德医疗保健(2020年1月至2024年5月期间二型糖尿病患者访问记录)提供的数据。 结果: 我们的治疗方案在BMC未知数据集中达到了比医生平均更高的HbA1c水平降低,中位数提高了0.26%。对于哈特福德队列,我们所提出的管道表现优于医生的处方效果0.13%。 结论: 这种精确、可解释且高效的AI辅助治疗方法进展方案在二型糖尿病治疗方面预计将超越现行做法,并可用于改善患者预后。
https://arxiv.org/abs/2504.12417
The application of artificial intelligence (AI) in medical imaging has revolutionized diagnostic practices, enabling advanced analysis and interpretation of radiological data. This study presents a comprehensive evaluation of radiomics-based and deep learning-based approaches for disease detection in chest radiography, focusing on COVID-19, lung opacity, and viral pneumonia. While deep learning models, particularly convolutional neural networks (CNNs) and vision transformers (ViTs), learn directly from image data, radiomics-based models extract and analyze quantitative features, potentially providing advantages in data-limited scenarios. This study systematically compares the diagnostic accuracy and robustness of various AI models, including Decision Trees, Gradient Boosting, Random Forests, Support Vector Machines (SVM), and Multi-Layer Perceptrons (MLP) for radiomics, against state-of-the-art computer vision deep learning architectures. Performance metrics across varying sample sizes reveal insights into each model's efficacy, highlighting the contexts in which specific AI approaches may offer enhanced diagnostic capabilities. The results aim to inform the integration of AI-driven diagnostic tools in clinical practice, particularly in automated and high-throughput environments where timely, reliable diagnosis is critical. This comparative study addresses an essential gap, establishing guidance for the selection of AI models based on clinical and operational needs.
在医学影像领域应用人工智能(AI)已经彻底改变了诊断实践,使得放射学数据的高级分析和解读成为可能。本研究全面评估了基于放射组学和深度学习的方法,在胸部X光片疾病检测中的表现,重点探讨了新冠病毒、肺部不透明区域及病毒性肺炎的情况。虽然深度学习模型特别是卷积神经网络(CNN)和视觉变换器(ViT),能够直接从图像数据中进行学习,而基于放射组学的模型则通过提取并分析定量特征来进行工作,在数据有限的情况下可能会提供优势。 本研究系统地比较了各种AI模型的诊断准确性和鲁棒性,包括决策树、梯度提升、随机森林、支持向量机(SVM)和多层感知器(MLP),用于放射组学,并将其与最先进的计算机视觉深度学习架构进行对比。在不同样本大小下的性能指标揭示了每种模型的有效性,突出了特定AI方法可能提供增强诊断能力的具体情况。 研究结果旨在为临床实践中集成基于人工智能的诊断工具提供指导,特别是在自动化和高通量环境中,这些地方需要及时、可靠的诊断。这项比较研究解决了一个重要缺口,并确立了根据临床及运营需求选择AI模型的指南。
https://arxiv.org/abs/2504.12249
Biomedical images often contain objects known to be spatially correlated or nested due to their inherent properties, leading to semantic relations. Examples include cell nuclei being nested within eukaryotic cells and colonies growing exclusively within their culture dishes. While these semantic relations bear key importance, detection tasks are often formulated independently, requiring multi-shot analysis pipelines. Importantly, spatial correlation could constitute a fundamental prior facilitating learning of more meaningful representations for tasks like instance segmentation. This knowledge has, thus far, not been utilised by the biomedical computer vision community. We argue that the instance segmentation of two or more categories of objects can be achieved in parallel. We achieve this via two architectures HydraStarDist (HSD) and the novel (HSD-WBR) based on the widely-used StarDist (SD), to take advantage of the star-convexity of our target objects. HSD and HSD-WBR are constructed to be capable of incorporating their interactions as constraints into account. HSD implicitly incorporates spatial correlation priors based on object interaction through a joint encoder. HSD-WBR further enforces the prior in a regularisation layer with the penalty we proposed named Within Boundary Regularisation Penalty (WBR). Both architectures achieve nested instance segmentation in a single shot. We demonstrate their competitiveness based on $IoU_R$ and AP and superiority in a new, task-relevant criteria, Joint TP rate (JTPR) compared to their baseline SD and Cellpose. Our approach can be further modified to capture partial-inclusion/-exclusion in multi-object interactions in fluorescent or brightfield microscopy or digital imaging. Finally, our strategy suggests gains by making this learning single-shot and computationally efficient.
生物医学图像通常包含由于其固有属性而空间相关的对象或嵌套对象,导致具有语义关系。例如,细胞核位于真核细胞内,菌落仅在其培养皿中生长。尽管这些语义关系至关重要,但检测任务常常被独立地设计,需要多步分析流程。重要的是,这种空间相关性可以构成一种基础先验条件,有助于学习更具意义的表示形式,特别是在实例分割等任务中。到目前为止,生物医学计算机视觉社区尚未充分利用这一知识。 我们主张可以在同一过程中并行实现两个或多个类别对象的实例分割。通过两种基于广泛使用的StarDist(SD)架构来实现这一点:HydraStarDist (HSD) 和新的 Hydradist-WBR (HSD-WBR)。这两种架构都利用了目标对象的星凸性,同时能够将它们之间的相互作用作为约束条件考虑进去。 - HSD 通过联合编码器隐式地基于对象间的交互纳入空间相关性的先验。 - HSD-WBR 在我们提出的名为边界内正则化惩罚(WBR)的正则化层中进一步强制执行这一先验。 这两种架构都能够以单次操作完成嵌套实例分割。我们在 IoU_R 和 AP 等标准上展示了它们的竞争性,并且在新的任务相关标准联合真阳性率(JTPR)方面优于基准 SD 和 Cellpose 方法。我们的方法还可以进一步修改,以捕捉荧光或明场显微镜或数字成像中多对象交互中的部分包含/排除现象。 最后,通过使学习单次完成并计算效率更高,我们的策略表明了其潜在的收益。
https://arxiv.org/abs/2504.12078
Many existing digital triage systems are questionnaire-based, guiding patients to appropriate care levels based on information (e.g., symptoms, medical history, and urgency) provided by the patients answering questionnaires. Such a system often uses a deterministic model with predefined rules to determine care levels. It faces challenges with incomplete triage interviews since it can only assist patients who finish the process. In this study, we explore the use of machine learning (ML) to predict outcomes of unfinished interviews, aiming to enhance patient care and service quality. Predicting triage outcomes from incomplete data is crucial for patient safety and healthcare efficiency. Our findings show that decision-tree models, particularly LGBMClassifier and CatBoostClassifier, achieve over 80\% accuracy in predicting outcomes from complete interviews while having a linear correlation between the prediction accuracy and interview completeness degree. For example, LGBMClassifier achieves 88,2\% prediction accuracy for interviews with 100\% completeness, 79,6\% accuracy for interviews with 80\% completeness, 58,9\% accuracy for 60\% completeness, and 45,7\% accuracy for 40\% completeness. The TabTransformer model demonstrated exceptional accuracy of over 80\% for all degrees of completeness but required extensive training time, indicating a need for more powerful computational resources. The study highlights the linear correlation between interview completeness and predictive power of the decision-tree models.
许多现有的数字分诊系统基于问卷,通过患者提供的信息(例如症状、病史和紧急程度)来引导患者进入适当的护理级别。这类系统通常使用预先定义规则的确定性模型来决定护理等级。然而,由于只能帮助完成整个流程的患者,此类系统在处理未完成的分诊访谈时面临挑战。本研究探索了利用机器学习(ML)预测不完整访谈的结果的可能性,旨在提升患者的护理质量和医疗服务效率。从不完整数据中预测分诊结果对于确保患者安全和提高医疗保健效率至关重要。 我们的研究表明,决策树模型,尤其是LGBMClassifier和CatBoostClassifier,在预测完整问卷的分诊结果时能够实现超过80%的准确率,并且其预测准确性与访谈完成度呈线性相关。例如,当访谈完成度达到100%时,LGBMClassifier可以达到88.2%的预测准确率;而当完成度为80%,60%,40%时,其准确率分别降至79.6%,58.9%,和45.7%。TabTransformer模型在所有程度的访谈完整度下都能表现出超过80%的卓越准确性,但需要大量的训练时间,表明了对更强大计算资源的需求。 这项研究强调了决策树模型的预测能力与访谈完成度之间的线性相关性。
https://arxiv.org/abs/2504.11977
X-ray imaging plays a crucial role in the medical field, providing essential insights into the internal anatomy of patients for diagnostics, image-guided procedures, and clinical decision-making. Traditional techniques often require multiple X-ray projections from various angles to obtain a comprehensive view, leading to increased radiation exposure and more complex clinical processes. This paper explores an innovative approach using the DL-GIPS model, which synthesizes X-ray projections from new viewpoints by leveraging a single existing projection. The model strategically manipulates geometry and texture features extracted from an initial projection to match new viewing angles. It then synthesizes the final projection by merging these modified geometry features with consistent texture information through an advanced image generation process. We demonstrate the effectiveness and broad applicability of the DL-GIPS framework through lung imaging examples, highlighting its potential to revolutionize stereoscopic and volumetric imaging by minimizing the need for extensive data acquisition.
X射线成像在医学领域扮演着至关重要的角色,为诊断、图像引导手术和临床决策提供了对患者内部解剖结构的重要见解。传统技术通常需要从多个角度获取多张X光片才能获得全面的视图,这会导致辐射暴露增加以及临床流程更加复杂。本文探讨了一种创新方法,即利用DL-GIPS(深度学习生成图像投影合成)模型,通过使用单一现有的投影来综合生成不同视角的新X射线投影。该模型巧妙地操纵从初始投影中提取的几何和纹理特征,使其与新的观察角度相匹配。然后,通过先进的图像生成过程将这些修改后的几何特征与一致的纹理信息合并,从而合成最终的投影。 我们通过肺部成像的例子展示了DL-GIPS框架的有效性和广泛适用性,突显了它在减少大量数据采集需求的同时革新立体和体积成像领域的潜力。
https://arxiv.org/abs/2504.11953
Feature matching across video streams remains a cornerstone challenge in computer vision. Increasingly, robust multimodal matching has garnered interest in robotics, surveillance, remote sensing, and medical imaging. While traditional rely on detecting and matching spatial features, they break down when faced with noisy, misaligned, or cross-modal data. Recent deep learning methods have improved robustness through learned representations, but remain constrained by their dependence on extensive training data and computational demands. We present Flow Intelligence, a paradigm-shifting approach that moves beyond spatial features by focusing on temporal motion patterns exclusively. Instead of detecting traditional keypoints, our method extracts motion signatures from pixel blocks across consecutive frames and extract temporal motion signatures between videos. These motion-based descriptors achieve natural invariance to translation, rotation, and scale variations while remaining robust across different imaging modalities. This novel approach also requires no pretraining data, eliminates the need for spatial feature detection, enables cross-modal matching using only temporal motion, and it outperforms existing methods in challenging scenarios where traditional approaches fail. By leveraging motion rather than appearance, Flow Intelligence enables robust, real-time video feature matching in diverse environments.
视频流之间的特征匹配仍然是计算机视觉中的核心挑战。近年来,鲁棒的跨模态匹配在机器人技术、监控、遥感和医学成像等领域引起了越来越多的兴趣。传统的方法依赖于检测和匹配空间特征,但当面对噪声大、对齐不佳或跨模式数据时,这些方法往往会失效。尽管最近基于深度学习的方法通过学习表示提高了鲁棒性,但仍受限于其需要大量训练数据以及计算需求。 我们提出了“流智能”(Flow Intelligence),这是一种范式转变的方法,它超越了空间特征的限制,专注于仅考虑时间运动模式。与传统的关键点检测不同,我们的方法从连续帧之间的像素块中提取运动签名,并在视频之间抽取时间运动签名。这些基于运动的描述符自然具有平移、旋转和尺度变化不变性,同时还能跨不同的成像模态保持鲁棒性。这种新颖的方法不需要预训练数据,无需空间特征检测,仅通过时间运动即可实现跨模态匹配,并且在传统方法失效的挑战场景中表现出色。 通过利用运动而非外观,“流智能”能够在各种环境中实现稳健、实时的视频特征匹配。
https://arxiv.org/abs/2504.11949
Current sign language machine translation systems rely on recognizing hand movements, facial expressions and body postures, and natural language processing, to convert signs into text. Recent approaches use Transformer architectures to model long-range dependencies via positional encoding. However, they lack accuracy in recognizing fine-grained, short-range temporal dependencies between gestures captured at high frame rates. Moreover, their high computational complexity leads to inefficient training. To mitigate these issues, we propose an Adaptive Transformer (ADAT), which incorporates components for enhanced feature extraction and adaptive feature weighting through a gating mechanism to emphasize contextually relevant features while reducing training overhead and maintaining translation accuracy. To evaluate ADAT, we introduce MedASL, the first public medical American Sign Language dataset. In sign-to-gloss-to-text experiments, ADAT outperforms the encoder-decoder transformer, improving BLEU-4 accuracy by 0.1% while reducing training time by 14.33% on PHOENIX14T and 3.24% on MedASL. In sign-to-text experiments, it improves accuracy by 8.7% and reduces training time by 2.8% on PHOENIX14T and achieves 4.7% higher accuracy and 7.17% faster training on MedASL. Compared to encoder-only and decoder-only baselines in sign-to-text, ADAT is at least 6.8% more accurate despite being up to 12.1% slower due to its dual-stream structure.
当前的手语机器翻译系统依赖于识别手部动作、面部表情和身体姿态,并通过自然语言处理将手势转换为文本。近期的方法采用了Transformer架构,利用位置编码来建模长距离依赖关系。然而,它们在捕捉高帧率下细微且短时间内的手势依赖关系方面缺乏准确性。此外,其计算复杂度很高,导致训练效率低下。 为了缓解这些问题,我们提出了一种自适应Transformer(ADAT),该模型通过引入增强特征提取和自适应特征加权的组件来解决这一问题,并通过门控机制强调上下文相关的特征,同时减少训练开销并保持翻译准确性。为了评估ADAT的效果,我们推出了MedASL,这是首个公开的医学美国手语数据集。 在手势到文字(经由词符)的实验中,在PHOENIX14T和MedASL上,ADAT的表现优于编码器-解码器Transformer模型,BLEU-4精度提升了0.1%,训练时间分别缩短了14.33%和3.24%。在直接手势到文字的实验中,在PHOENIX14T数据集上,ADAT提高了8.7%的准确率,并减少了2.8%的训练时间;而在MedASL上,其表现更是提升了4.7%的准确度并加快了7.17%的训练速度。 与手势到文字任务中的编码器和解码器基线相比,尽管ADAT由于其双流结构最多慢至多12.1%,但在准确性方面至少提高了6.8%。
https://arxiv.org/abs/2504.11942
Blood vessel networks in the brain play a crucial role in stroke research, where understanding their topology is essential for analyzing blood flow dynamics. However, extracting detailed topological vessel network information from microscopy data remains a significant challenge, mainly due to the scarcity of labeled training data and the need for high topological accuracy. This work combines synthetic data generation with deep learning to automatically extract vessel networks as graphs from volumetric microscopy data. To combat data scarcity, we introduce a comprehensive pipeline for generating large-scale synthetic datasets that mirror the characteristics of real vessel networks. Our three-stage approach progresses from abstract graph generation through vessel mask creation to realistic medical image synthesis, incorporating biological constraints and imaging artifacts at each stage. Using this synthetic data, we develop a two-stage deep learning pipeline of 3D U-Net-based models for node detection and edge prediction. Fine-tuning on real microscopy data shows promising adaptation, improving edge prediction F1 scores from 0.496 to 0.626 by training on merely 5 manually labeled samples. These results suggest that automated vessel network extraction is becoming practically feasible, opening new possibilities for large-scale vascular analysis in stroke research.
大脑中的血管网络在中风研究中扮演着至关重要的角色,了解其拓扑结构对于分析血流动力学至关重要。然而,从显微镜数据中提取详细的血管网络拓扑信息仍然是一个重大挑战,主要是由于标记训练数据的稀缺性和对高拓扑精度的需求。这项工作结合了合成数据生成与深度学习技术,以自动从体积显微镜数据中提取血管网络图。 为了解决数据不足的问题,我们引入了一个全面的数据生成管道来创建大规模的合成数据集,这些数据集能够反映真实血管网络的特点。我们的三阶段方法从抽象图形生成开始,通过血管掩码创建到现实医学图像合成逐步进行,在每个阶段都融入了生物约束和成像伪影。 利用这种合成数据,我们开发了一个两阶段的深度学习管道,采用基于3D U-Net的模型来进行节点检测和边缘预测。在真实显微镜数据上的微调显示出令人鼓舞的适应性改进:通过仅使用5个手动标记的数据样本进行训练,F1得分从0.496提高到了0.626。 这些结果表明,自动血管网络提取正变得越来越具有实际可行性,为中风研究中的大规模血管分析开辟了新的可能性。
https://arxiv.org/abs/2504.11858
Root canal (RC) treatment is a highly delicate and technically complex procedure in clinical practice, heavily influenced by the clinicians' experience and subjective judgment. Deep learning has made significant advancements in the field of computer-aided diagnosis (CAD) because it can provide more objective and accurate diagnostic results. However, its application in RC treatment is still relatively rare, mainly due to the lack of public datasets in this field. To address this issue, in this paper, we established a First Molar Root Canal segmentation dataset called FMRC-2025. Additionally, to alleviate the workload of manual annotation for dentists and fully leverage the unlabeled data, we designed a Cross-Frequency Collaborative training semi-supervised learning (SSL) Network called CFC-Net. It consists of two components: (1) Cross-Frequency Collaborative Mean Teacher (CFC-MT), which introduces two specialized students (SS) and one comprehensive teacher (CT) for collaborative multi-frequency training. The CT and SS are trained on different frequency components while fully integrating multi-frequency knowledge through cross and full frequency consistency supervisions. (2) Uncertainty-guided Cross-Frequency Mix (UCF-Mix) mechanism enables the network to generate high-confidence pseudo-labels while learning to integrate multi-frequency information and maintaining the structural integrity of the targets. Extensive experiments on FMRC-2025 and three public dental datasets demonstrate that CFC-MT is effective for RC segmentation and can also exhibit strong generalizability on other dental segmentation tasks, outperforming state-of-the-art SSL medical image segmentation methods. Codes and dataset will be released.
根管治疗(RC)是一种临床上高度精细且技术复杂的程序,很大程度上受到临床医生经验和主观判断的影响。深度学习在计算机辅助诊断(CAD)领域取得了显著进展,因为它可以提供更为客观和准确的诊断结果。然而,在根管治疗中的应用仍然相对较少,主要是由于该领域的公共数据集匮乏。为了解决这一问题,本文建立了一个名为FMRC-2025的第一磨牙根管分割数据集。此外,为了减轻牙医手动标注的工作量并充分利用未标记的数据,我们设计了一种跨频谱协作训练的半监督学习(SSL)网络,称为CFC-Net。该网络由两个部分组成: 1. **跨频谱协作均值教师(CFC-MT)**:引入了两名专业学生(SS)和一名综合教师(CT),用于协同多频率训练。CT和SS在不同的频率成分上进行训练,并通过跨频率和全频率一致性监督完全整合多频率知识。 2. **不确定性指导的跨频谱混合机制(UCF-Mix)**:该机制使网络能够生成高置信度的伪标签,同时学习如何融合多频率信息并保持目标结构的完整性。 在FMRC-2025和三个公共牙科数据集上的广泛实验表明,CFC-MT对于根管分割非常有效,并且在其他牙科分割任务上也表现出强大的泛化能力,优于最先进的SSL医学图像分割方法。代码和数据集将公开发布。
https://arxiv.org/abs/2504.11856
Diffusion Probabilistic Models (DPMs) have demonstrated significant potential in 3D medical image segmentation tasks. However, their high computational cost and inability to fully capture global 3D contextual information limit their practical applications. To address these challenges, we propose a novel text-guided diffusion model framework, TextDiffSeg. This method leverages a conditional diffusion framework that integrates 3D volumetric data with natural language descriptions, enabling cross-modal embedding and establishing a shared semantic space between visual and textual modalities. By enhancing the model's ability to recognize complex anatomical structures, TextDiffSeg incorporates innovative label embedding techniques and cross-modal attention mechanisms, effectively reducing computational complexity while preserving global 3D contextual integrity. Experimental results demonstrate that TextDiffSeg consistently outperforms existing methods in segmentation tasks involving kidney and pancreas tumors, as well as multi-organ segmentation scenarios. Ablation studies further validate the effectiveness of key components, highlighting the synergistic interaction between text fusion, image feature extractor, and label encoder. TextDiffSeg provides an efficient and accurate solution for 3D medical image segmentation, showcasing its broad applicability in clinical diagnosis and treatment planning.
扩散概率模型(DPM)在三维医学图像分割任务中展示了巨大的潜力。然而,它们的高计算成本和无法完全捕捉全局三维上下文信息限制了其实际应用。为了解决这些挑战,我们提出了一种新的文本引导扩散模型框架TextDiffSeg。该方法利用了一个条件扩散框架,将3D体积数据与自然语言描述相结合,实现了跨模态嵌入,并建立了视觉和文本模式之间的共享语义空间。通过增强模型识别复杂解剖结构的能力,TextDiffSeg引入了创新的标签嵌入技术和跨模态注意机制,在减少计算复杂性的同时保持全局三维上下文完整性。实验结果表明,TextDiffSeg在涉及肾脏和胰腺肿瘤以及多器官分割场景的分割任务中持续优于现有方法。消融研究进一步验证了关键组件的有效性,并强调了文本融合、图像特征提取器与标签编码器之间的协同作用。TextDiffSeg为三维医学图像分割提供了一种高效且准确的解决方案,展示了其在临床诊断和治疗计划中的广泛应用潜力。
https://arxiv.org/abs/2504.11825
The ability to predict drug overdose risk from a patient's medical records is crucial for timely intervention and prevention. Traditional machine learning models have shown promise in analyzing longitudinal medical records for this task. However, recent advancements in large language models (LLMs) offer an opportunity to enhance prediction performance by leveraging their ability to process long textual data and their inherent prior knowledge across diverse tasks. In this study, we assess the effectiveness of Open AI's GPT-4o LLM in predicting drug overdose events using patients' longitudinal insurance claims records. We evaluate its performance in both fine-tuned and zero-shot settings, comparing them to strong traditional machine learning methods as baselines. Our results show that LLMs not only outperform traditional models in certain settings but can also predict overdose risk in a zero-shot setting without task-specific training. These findings highlight the potential of LLMs in clinical decision support, particularly for drug overdose risk prediction.
从患者的医疗记录中预测药物过量风险对于及时干预和预防至关重要。传统的机器学习模型在分析纵向医疗记录以完成此任务方面已显示出潜力。然而,近期的大规模语言模型(LLMs)的发展为通过利用其处理长文本数据的能力以及跨各种任务的内在先验知识来提升预测性能提供了机会。在这项研究中,我们评估了Open AI的GPT-4 LLM在使用患者纵向保险索赔记录预测药物过量事件方面的有效性。我们在微调和零样本设置下对其性能进行了评估,并将其与传统的机器学习方法作为基准进行比较。我们的结果显示,在某些情况下,LLMs不仅超越了传统模型,而且还能在无需特定任务训练的零样本设置中预测药物过量风险。这些发现突显了LLMs在临床决策支持方面的潜力,尤其是在药物过量风险预测方面。
https://arxiv.org/abs/2504.11792
Medical Visual Question Answering (MVQA) systems can interpret medical images in response to natural language queries. However, linguistic variability in question phrasing often undermines the consistency of these systems. To address this challenge, we propose a Semantically Equivalent Question Augmentation (SEQA) framework, which leverages large language models (LLMs) to generate diverse yet semantically equivalent rephrasings of questions. Specifically, this approach enriches linguistic diversity while preserving semantic meaning. We further introduce an evaluation metric, Total Agreement Rate with Semantically Equivalent Input and Correct Answer (TAR-SC), which assesses a model's capability to generate consistent and correct responses to semantically equivalent linguistic variations. In addition, we also propose three other diversity metrics - average number of QA items per image (ANQI), average number of questions per image with the same answer (ANQA), and average number of open-ended questions per image with the same semantics (ANQS). Using the SEQA framework, we augmented the benchmarked MVQA public datasets of SLAKE, VQA-RAD, and PathVQA. As a result, all three datasets achieved significant improvements by incorporating more semantically equivalent questions: ANQI increased by an average of 86.1, ANQA by 85.1, and ANQS by 46. Subsequent experiments evaluate three MVQA models (M2I2, MUMC, and BiomedGPT) under both zero-shot and fine-tuning settings on the enhanced datasets. Experimental results in MVQA datasets show that fine-tuned models achieve an average accuracy improvement of 19.35%, while our proposed TAR-SC metric shows an average improvement of 11. 61%, indicating a substantial enhancement in model consistency.
医学视觉问答(MVQA)系统能够根据自然语言查询来解释医学影像。然而,问题表述中的语言变化常常会影响这些系统的稳定性。为了解决这一挑战,我们提出了一种语义等价问题增强(SEQA)框架,该框架利用大规模语言模型(LLMs)生成多样化的但语义上等价的问题重述。这种方法在保持语义含义的同时丰富了语言多样性。此外,我们还引入了一个评估指标——与语义等价输入和正确答案一致的总比率(TAR-SC),用以衡量模型对语义相似的语言变化的一致性和准确度的能力。同时,我们也提出了三个其他多样性指标:每张图像平均问答项数量(ANQI)、具有相同回答的每张图像平均问题数(ANQA)以及具有相同语义的开放式问题数量(ANQS)。利用SEQA框架,我们增强了SLAKE、VQA-RAD和PathVQA等基准MVQA公开数据集。结果表明,所有三个数据集中通过引入更多语义上等价的问题取得了显著改进:ANQI平均增加了86.1,ANQA增加了85.1,而ANQS则增加了46。随后的实验评估了在增强后的数据集上,在零样本和微调设置下三种MVQA模型(M2I2、MUMC以及BiomedGPT)的表现。在MVQA数据集中的实验证明,经过微调的模型平均准确性提高了19.35%,而我们提出的TAR-SC指标则显示了平均提升为11.61%的结果,这表明模型的一致性有了显著提高。
https://arxiv.org/abs/2504.11777