Paper Reading AI Learner

Ultrasound Image-to-Video Synthesis via Latent Dynamic Diffusion Models

2025-03-19 07:58:43
Tingxiu Chen, Yilei Shi, Zixuan Zheng, Bingcong Yan, Jingliang Hu, Xiao Xiang Zhu, Lichao Mou

Abstract

Ultrasound video classification enables automated diagnosis and has emerged as an important research area. However, publicly available ultrasound video datasets remain scarce, hindering progress in developing effective video classification models. We propose addressing this shortage by synthesizing plausible ultrasound videos from readily available, abundant ultrasound images. To this end, we introduce a latent dynamic diffusion model (LDDM) to efficiently translate static images to dynamic sequences with realistic video characteristics. We demonstrate strong quantitative results and visually appealing synthesized videos on the BUSV benchmark. Notably, training video classification models on combinations of real and LDDM-synthesized videos substantially improves performance over using real data alone, indicating our method successfully emulates dynamics critical for discrimination. Our image-to-video approach provides an effective data augmentation solution to advance ultrasound video analysis. Code is available at this https URL.

Abstract (translated)

超声视频分类能够实现自动化诊断,并已成为一个重要研究领域。然而,公开可用的超声视频数据集仍然稀缺,阻碍了有效视频分类模型的发展。我们提出通过从大量可获得的超声图像中合成逼真的超声视频来解决这一短缺问题。为此,我们引入了一种潜在动态扩散模型(LDDM),该模型可以高效地将静态图像转换为具有现实视频特征的时间序列。我们在BUSV基准测试上展示了强大的定量结果和视觉效果出色的合成视频。值得注意的是,在真实数据与使用LDDM生成的合成数据组合训练视频分类模型时,性能显著优于仅使用真实数据的情况,表明我们的方法成功模拟了对于区分至关重要动态特性。我们从图像到视频的方法提供了一种有效的数据增强解决方案,以推进超声视频分析的进步。代码可在[提供的URL]获取。 注:原文中的“this https URL”应替换为实际的链接地址,以便读者可以访问相关代码资源。

URL

https://arxiv.org/abs/2503.14966

PDF

https://arxiv.org/pdf/2503.14966.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot