Paper Reading AI Learner

Variable-frame CNNLSTM for Breast Nodule Classification using Ultrasound Videos

2025-02-17 06:35:37
Xiangxiang Cui, Zhongyu Li, Xiayue Fan, Peng Huang, Ying Wang, Meng Yang, Shi Chang, Jihua Zhu

Abstract

The intersection of medical imaging and artificial intelligence has become an important research direction in intelligent medical treatment, particularly in the analysis of medical images using deep learning for clinical diagnosis. Despite the advances, existing keyframe classification methods lack extraction of time series features, while ultrasonic video classification based on three-dimensional convolution requires uniform frame numbers across patients, resulting in poor feature extraction efficiency and model classification performance. This study proposes a novel video classification method based on CNN and LSTM, introducing NLP's long and short sentence processing scheme into video classification for the first time. The method reduces CNN-extracted image features to 1x512 dimension, followed by sorting and compressing feature vectors for LSTM training. Specifically, feature vectors are sorted by patient video frame numbers and populated with padding value 0 to form variable batches, with invalid padding values compressed before LSTM training to conserve computing resources. Experimental results demonstrate that our variable-frame CNNLSTM method outperforms other approaches across all metrics, showing improvements of 3-6% in F1 score and 1.5% in specificity compared to keyframe methods. The variable-frame CNNLSTM also achieves better accuracy and precision than equal-frame CNNLSTM. These findings validate the effectiveness of our approach in classifying variable-frame ultrasound videos and suggest potential applications in other medical imaging modalities.

Abstract (translated)

医学影像与人工智能的交叉领域已成为智能医疗治疗中的一个重要研究方向,特别是在使用深度学习分析临床诊断所需的医学图像方面。尽管已经取得了一些进展,但现有的关键帧分类方法缺乏对时间序列特征的提取,而基于三维卷积的超声视频分类需要患者之间的帧数统一,这导致了较差的特征提取效率和模型分类性能。 本研究提出了一种基于CNN(卷积神经网络)和LSTM(长短时记忆网络)的新颖视频分类方法,并首次将NLP(自然语言处理)中的长短句处理方案引入到视频分类中。该方法首先通过CNN提取图像特征并将其压缩为1x512维,然后对特征向量进行排序和压缩以便于LSTM训练。具体而言,根据患者视频帧数来排序特征向量,并使用填充值0形成可变批次大小的序列,在送入LSTM训练前将无效的填充值压缩以节省计算资源。 实验结果表明,我们的可变帧CNNLSTM方法在所有评价指标上均优于其他方法。与关键帧方法相比,F1分数提高了3-6%,特异性提高了1.5%。此外,可变帧CNNLSTM还比等帧的CNNLSTM实现了更高的准确性和精度。 这些发现验证了我们在分类不同长度帧数超声视频中的有效性,并为该技术在其他医学影像模态上的应用提供了可能。

URL

https://arxiv.org/abs/2502.11481

PDF

https://arxiv.org/pdf/2502.11481.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot