CellMix: A General Instance Relationship based Method for Data Augmentation Towards Pathology Image Analysis

Abstract
Abstract (translated)
URL
PDF

Abstract

Pathology image analysis crucially relies on the availability and quality of annotated pathological samples, which are very difficult to collect and need lots of human effort. To address this issue, beyond traditional preprocess data augmentation methods, mixing-based approaches are effective and practical. However, previous mixing-based data augmentation methods do not thoroughly explore the essential characteristics of pathology images, including the local specificity, global distribution, and inner/outer-sample instance relationship. To further understand the pathology characteristics and make up effective pseudo samples, we propose the CellMix framework with a novel distribution-based in-place shuffle strategy. We split the images into patches with respect to the granularity of pathology instances and do the shuffle process across the same batch. In this way, we generate new samples while keeping the absolute relationship of pathology instances intact. Furthermore, to deal with the perturbations and distribution-based noise, we devise a loss-drive strategy inspired by curriculum learning during the training process, making the model fit the augmented data adaptively. It is worth mentioning that we are the first to explore data augmentation techniques in the pathology image field. Experiments show SOTA results on 7 different datasets. We conclude that this novel instance relationship-based strategy can shed light on general data augmentation for pathology image analysis. The code is available at this https URL.

Abstract (translated)

病理图像分析 crucially rely on the availability and quality of annotate pathological samples, which are very difficult to collect and need lots of human effort. 为了解决这一问题,除了传统的预处理数据增强方法,混合方法是非常有效且实用的。但是,以前的混合数据增强方法并没有充分探索病理图像的重要特征,包括局部特异性、全球分布以及内部样本和外部样本实例之间的关系。为了进一步理解病理特征并生成有效的假样本,我们提出了Cell Mix框架,并提出了一种新的分布based in-place shuffle策略。我们将图像按照病理实例颗粒大小划分为块,并在同一批中对其进行shuffle过程。这样,我们就能生成新的样本,同时保持病理实例绝对关系不变。此外,为了处理影响和分布based噪声,我们设计了基于课程学习的损失驱动策略,使模型自适应地适应增强数据。值得注意的是,我们在病理图像领域是首先探索数据增强技术的。实验结果显示,在7个不同数据集上取得了SOTA的结果。我们得出结论,这种新型实例关系策略可以阐明病理图像分析中一般性的数据增强方法。代码在此https URL上可用。

URL

https://arxiv.org/abs/2301.11513

PDF

https://arxiv.org/pdf/2301.11513.pdf