Abstract
Ultrasound video classification enables automated diagnosis and has emerged as an important research area. However, publicly available ultrasound video datasets remain scarce, hindering progress in developing effective video classification models. We propose addressing this shortage by synthesizing plausible ultrasound videos from readily available, abundant ultrasound images. To this end, we introduce a latent dynamic diffusion model (LDDM) to efficiently translate static images to dynamic sequences with realistic video characteristics. We demonstrate strong quantitative results and visually appealing synthesized videos on the BUSV benchmark. Notably, training video classification models on combinations of real and LDDM-synthesized videos substantially improves performance over using real data alone, indicating our method successfully emulates dynamics critical for discrimination. Our image-to-video approach provides an effective data augmentation solution to advance ultrasound video analysis. Code is available at this https URL.
Abstract (translated)
超声视频分类能够实现自动化诊断,并已成为一个重要研究领域。然而,公开可用的超声视频数据集仍然稀缺,阻碍了有效视频分类模型的发展。我们提出通过从大量可获得的超声图像中合成逼真的超声视频来解决这一短缺问题。为此,我们引入了一种潜在动态扩散模型(LDDM),该模型可以高效地将静态图像转换为具有现实视频特征的时间序列。我们在BUSV基准测试上展示了强大的定量结果和视觉效果出色的合成视频。值得注意的是,在真实数据与使用LDDM生成的合成数据组合训练视频分类模型时,性能显著优于仅使用真实数据的情况,表明我们的方法成功模拟了对于区分至关重要动态特性。我们从图像到视频的方法提供了一种有效的数据增强解决方案,以推进超声视频分析的进步。代码可在[提供的URL]获取。 注:原文中的“this https URL”应替换为实际的链接地址,以便读者可以访问相关代码资源。
URL
https://arxiv.org/abs/2503.14966