Abstract
Self-supervised learning (SSL) has achieved major advances in natural images and video understanding, but challenges remain in domains like echocardiography (heart ultrasound) due to subtle anatomical structures, complex temporal dynamics, and the current lack of domain-specific pre-trained models. Existing SSL approaches such as contrastive, masked modeling, and clustering-based methods struggle with high intersample similarity, sensitivity to low PSNR inputs common in ultrasound, or aggressive augmentations that distort clinically relevant features. We present DISCOVR (Distilled Image Supervision for Cross Modal Video Representation), a self-supervised dual branch framework for cardiac ultrasound video representation learning. DISCOVR combines a clustering-based video encoder that models temporal dynamics with an online image encoder that extracts fine-grained spatial semantics. These branches are connected through a semantic cluster distillation loss that transfers anatomical knowledge from the evolving image encoder to the video encoder, enabling temporally coherent representations enriched with fine-grained semantic understanding. Evaluated on six echocardiography datasets spanning fetal, pediatric, and adult populations, DISCOVR outperforms both specialized video anomaly detection methods and state-of-the-art video-SSL baselines in zero-shot and linear probing setups, and achieves superior segmentation transfer.
Abstract (translated)
自监督学习(Self-supervised Learning,简称SSL)在自然图像和视频理解方面取得了重大进展,但在某些领域如超声心动图(心脏超声)中仍面临挑战。这些挑战主要源于微妙的解剖结构、复杂的时空动态变化以及目前缺乏特定领域的预训练模型。现有的自监督学习方法,例如对比学习、掩码建模和基于聚类的方法,在处理样本间相似度高、输入PSNR低(常见于超声波图像中的问题)或会扭曲临床相关特征的激进增强操作时遇到了困难。 我们提出了DISCOVR(Distilled Image Supervision for Cross-Modal Video Representation),这是一个用于心脏超声视频表征学习的自监督双分支框架。DISCOVR结合了一个基于聚类的视频编码器,该编码器模拟时间动态变化,并且还有一个在线图像编码器,它提取细粒度的空间语义信息。这些分支通过一个语义簇蒸馏损失连接起来,这个损失机制将不断演化的图像编码器中的解剖知识传递给视频编码器,从而生成包含精细语义理解的时空一致表示。 在涵盖胎儿、儿童和成人人群的六个超声心动图数据集上进行评估后,DISCOVR在零样本设置(zero-shot)和线性探测设置中超越了专门针对视频异常检测的方法以及最先进的视频自监督学习基线,并且实现了更好的分割迁移性能。
URL
https://arxiv.org/abs/2506.11777