Abstract
The intersection of medical imaging and artificial intelligence has become an important research direction in intelligent medical treatment, particularly in the analysis of medical images using deep learning for clinical diagnosis. Despite the advances, existing keyframe classification methods lack extraction of time series features, while ultrasonic video classification based on three-dimensional convolution requires uniform frame numbers across patients, resulting in poor feature extraction efficiency and model classification performance. This study proposes a novel video classification method based on CNN and LSTM, introducing NLP's long and short sentence processing scheme into video classification for the first time. The method reduces CNN-extracted image features to 1x512 dimension, followed by sorting and compressing feature vectors for LSTM training. Specifically, feature vectors are sorted by patient video frame numbers and populated with padding value 0 to form variable batches, with invalid padding values compressed before LSTM training to conserve computing resources. Experimental results demonstrate that our variable-frame CNNLSTM method outperforms other approaches across all metrics, showing improvements of 3-6% in F1 score and 1.5% in specificity compared to keyframe methods. The variable-frame CNNLSTM also achieves better accuracy and precision than equal-frame CNNLSTM. These findings validate the effectiveness of our approach in classifying variable-frame ultrasound videos and suggest potential applications in other medical imaging modalities.
Abstract (translated)
医学影像与人工智能的交叉领域已成为智能医疗治疗中的一个重要研究方向,特别是在使用深度学习分析临床诊断所需的医学图像方面。尽管已经取得了一些进展,但现有的关键帧分类方法缺乏对时间序列特征的提取,而基于三维卷积的超声视频分类需要患者之间的帧数统一,这导致了较差的特征提取效率和模型分类性能。 本研究提出了一种基于CNN(卷积神经网络)和LSTM(长短时记忆网络)的新颖视频分类方法,并首次将NLP(自然语言处理)中的长短句处理方案引入到视频分类中。该方法首先通过CNN提取图像特征并将其压缩为1x512维,然后对特征向量进行排序和压缩以便于LSTM训练。具体而言,根据患者视频帧数来排序特征向量,并使用填充值0形成可变批次大小的序列,在送入LSTM训练前将无效的填充值压缩以节省计算资源。 实验结果表明,我们的可变帧CNNLSTM方法在所有评价指标上均优于其他方法。与关键帧方法相比,F1分数提高了3-6%,特异性提高了1.5%。此外,可变帧CNNLSTM还比等帧的CNNLSTM实现了更高的准确性和精度。 这些发现验证了我们在分类不同长度帧数超声视频中的有效性,并为该技术在其他医学影像模态上的应用提供了可能。
URL
https://arxiv.org/abs/2502.11481