Abstract
Data generated in clinical practice often exhibits biases, such as long-tail imbalance and algorithmic unfairness. This study aims to mitigate these challenges through data synthesis. Previous efforts in medical imaging synthesis have struggled with separating lesion information from background context, leading to difficulties in generating high-quality backgrounds and limited control over the synthetic output. Inspired by diffusion-based image inpainting, we propose LeFusion, lesion-focused diffusion models. By redesigning the diffusion learning objectives to concentrate on lesion areas, it simplifies the model learning process and enhance the controllability of the synthetic output, while preserving background by integrating forward-diffused background contexts into the reverse diffusion process. Furthermore, we generalize it to jointly handle multi-class lesions, and further introduce a generative model for lesion masks to increase synthesis diversity. Validated on the DE-MRI cardiac lesion segmentation dataset (Emidec), our methodology employs the popular nnUNet to demonstrate that the synthetic data make it possible to effectively enhance a state-of-the-art model. Code and model are available at this https URL.
Abstract (translated)
临床实践中产生的数据通常存在偏差,例如长尾不平衡和算法不公。本研究旨在通过数据合成来缓解这些挑战。先前在医学图像合成方面的努力遇到了将病灶信息与背景上下文分离的困难,导致高质量背景的生成以及合成输出控制的有限性。受到扩散为基础的图像修复启发,我们提出了LeFusion,病灶聚类的扩散模型。通过将扩散学习目标重新设计为集中于病灶区域,它简化了模型学习过程,提高了合成输出的可控性,同时通过将前向扩散背景上下文融入反向扩散过程来保留背景。此外,我们还将其扩展到处理多类病灶,并进一步引入病灶掩膜的生成模型以增加合成多样性。通过在DE-MRI心脏病灶分割数据集(Emidec)上的验证,我们的方法采用流行的nnUNet证明了合成数据使得最先进的模型能够得到有效的增强。代码和模型可在此处访问的链接中获取。
URL
https://arxiv.org/abs/2403.14066