Abstract
Microscopic assessment of histopathology images is vital for accurate cancer diagnosis and treatment. Whole Slide Image (WSI) classification and captioning have become crucial tasks in computer-aided pathology. However, microscopic WSI face challenges such as redundant patches and unknown patch positions due to subjective pathologist captures. Moreover, generating automatic pathology captions remains a significant challenge. To address these issues, we introduce a novel GNN-ViTCap framework for classification and caption generation from histopathological microscopic images. First, a visual feature extractor generates patch embeddings. Redundant patches are then removed by dynamically clustering these embeddings using deep embedded clustering and selecting representative patches via a scalar dot attention mechanism. We build a graph by connecting each node to its nearest neighbors in the similarity matrix and apply a graph neural network to capture both local and global context. The aggregated image embeddings are projected into the language model's input space through a linear layer and combined with caption tokens to fine-tune a large language model. We validate our method on the BreakHis and PatchGastric datasets. GNN-ViTCap achieves an F1 score of 0.934 and an AUC of 0.963 for classification, along with a BLEU-4 score of 0.811 and a METEOR score of 0.569 for captioning. Experimental results demonstrate that GNN-ViTCap outperforms state of the art approaches, offering a reliable and efficient solution for microscopy based patient diagnosis.
Abstract (translated)
显微病理图像的微观评估对于准确的癌症诊断和治疗至关重要。全滑动图像(WSI)分类和描述已成为计算机辅助病理学中的关键任务。然而,由于主观的病理学家拍摄方式导致的问题,如冗余补丁和未知的位置信息,使得这些问题变得复杂。此外,自动生成病理描述仍然是一个重大挑战。为了解决这些问题,我们引入了一种新的GNN-ViTCap框架,用于从组织病理学显微图像中进行分类和生成描述。 该框架首先通过视觉特征提取器生成补丁嵌入。然后,利用深度嵌入聚类动态地对这些嵌入进行聚类,并通过标量点注意力机制选择具有代表性的补丁以去除冗余补丁。接下来,我们建立一个图结构,将每个节点连接到相似度矩阵中的最近邻结点,并应用图形神经网络来捕捉局部和全局上下文信息。最后,聚合后的图像嵌入被线性层映射到语言模型的输入空间中,并与描述标记结合以微调大型语言模型。 我们在BreakHis和PatchGastric数据集上验证了该方法的有效性。结果显示,GNN-ViTCap在分类任务上的F1得分为0.934,AUC为0.963;在描述生成方面,BLEU-4评分为0.811,METEOR评分为0.569。实验结果表明,GNN-ViTCap优于现有方法,在基于显微镜的患者诊断中提供了可靠且高效的解决方案。
URL
https://arxiv.org/abs/2507.07006