Paper Reading AI Learner

CNN-LSTM Hybrid Model for AI-Driven Prediction of COVID-19 Severity from Spike Sequences and Clinical Data

2025-05-29 16:20:54
Caio Cheohen, Vinn\'icius M. S. Gomes, Manuela L. da Silva

Abstract

The COVID-19 pandemic, caused by SARS-CoV-2, highlighted the critical need for accurate prediction of disease severity to optimize healthcare resource allocation and patient management. The spike protein, which facilitates viral entry into host cells, exhibits high mutation rates, particularly in the receptor-binding domain, influencing viral pathogenicity. Artificial intelligence approaches, such as deep learning, offer promising solutions for leveraging genomic and clinical data to predict disease outcomes. Objective: This study aimed to develop a hybrid CNN-LSTM deep learning model to predict COVID-19 severity using spike protein sequences and associated clinical metadata from South American patients. Methods: We retrieved 9,570 spike protein sequences from the GISAID database, of which 3,467 met inclusion criteria after standardization. The dataset included 2,313 severe and 1,154 mild cases. A feature engineering pipeline extracted features from sequences, while demographic and clinical variables were one-hot encoded. A hybrid CNN-LSTM architecture was trained, combining CNN layers for local pattern extraction and an LSTM layer for long-term dependency modeling. Results: The model achieved an F1 score of 82.92%, ROC-AUC of 0.9084, precision of 83.56%, and recall of 82.85%, demonstrating robust classification performance. Training stabilized at 85% accuracy with minimal overfitting. The most prevalent lineages (P.1, AY.99.2) and clades (GR, GK) aligned with regional epidemiological trends, suggesting potential associations between viral genetics and clinical outcomes. Conclusion: The CNN-LSTM hybrid model effectively predicted COVID-19 severity using spike protein sequences and clinical data, highlighting the utility of AI in genomic surveillance and precision public health. Despite limitations, this approach provides a framework for early severity prediction in future outbreaks.

Abstract (translated)

由SARS-CoV-2引起的COVID-19大流行凸显了准确预测疾病严重程度以优化医疗资源分配和患者管理的迫切需求。刺突蛋白,促进病毒进入宿主细胞的关键成分,表现出高变异率,特别是在受体结合域中,这对病毒感染性有显著影响。人工智能方法,如深度学习技术,为利用基因组数据和临床信息来预测疾病结果提供了潜在解决方案。研究目的:本研究旨在开发一种混合CNN-LSTM深度学习模型,使用南美患者的刺突蛋白序列及其相关临床元数据预测COVID-19的严重程度。 **方法**: 我们从GISAID数据库中检索到9,570个刺突蛋白序列,其中3,467个在标准化后符合纳入标准。该数据集包括2,313例重症和1,154例轻症患者。通过特征工程管道提取了序列的特征,而人口统计学和临床变量则进行了独热编码处理。训练了一种混合CNN-LSTM架构模型,结合卷积神经网络层进行局部模式抽取以及长短期记忆(LSTM)层用于建模长期依赖性。 **结果**: 模型实现了F1分数82.92%,ROC-AUC值0.9084,精确度83.56%和召回率82.85%,展示了强大的分类性能。训练稳定在85%的准确性,且过度拟合现象最小化。最常见的谱系(P.1, AY.99.2)和亚系(GR, GK)与区域流行病学趋势一致,表明病毒遗传学可能与临床结果之间存在潜在关联。 **结论**: 混合CNN-LSTM模型成功地使用刺突蛋白序列及临床数据预测了COVID-19的严重程度,强调了AI在基因组监测和精准公共卫生活动中的实用性。尽管存在局限性,但该方法为未来疫情中早期预测疾病严重程度提供了一种框架。

URL

https://arxiv.org/abs/2505.23879

PDF

https://arxiv.org/pdf/2505.23879.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot