Deep Learning for Video-Based Assessment of Endotracheal Intubation Skills

Abstract
Abstract (translated)
URL
PDF

Abstract

Endotracheal intubation (ETI) is an emergency procedure performed in civilian and combat casualty care settings to establish an airway. Objective and automated assessment of ETI skills is essential for the training and certification of healthcare providers. However, the current approach is based on manual feedback by an expert, which is subjective, time- and resource-intensive, and is prone to poor inter-rater reliability and halo effects. This work proposes a framework to evaluate ETI skills using single and multi-view videos. The framework consists of two stages. First, a 2D convolutional autoencoder (AE) and a pre-trained self-supervision network extract features from videos. Second, a 1D convolutional enhanced with a cross-view attention module takes the features from the AE as input and outputs predictions for skill evaluation. The ETI datasets were collected in two phases. In the first phase, ETI is performed by two subject cohorts: Experts and Novices. In the second phase, novice subjects perform ETI under time pressure, and the outcome is either Successful or Unsuccessful. A third dataset of videos from a single head-mounted camera for Experts and Novices is also analyzed. The study achieved an accuracy of 100% in identifying Expert/Novice trials in the initial phase. In the second phase, the model showed 85% accuracy in classifying Successful/Unsuccessful procedures. Using head-mounted cameras alone, the model showed a 96% accuracy on Expert and Novice classification while maintaining an accuracy of 85% on classifying successful and unsuccessful. In addition, GradCAMs are presented to explain the differences between Expert and Novice behavior and Successful and Unsuccessful trials. The approach offers a reliable and objective method for automated assessment of ETI skills.

Abstract (translated)

内窥镜引导（ETI）是一种在民事和军事医疗设施中进行的紧急措施，旨在建立气道。ETI技能的客观和自动评估对于医疗保健提供者的培训和认证至关重要。然而，目前的做法基于专家手动反馈，这是主观、时间和资源密集的，并且容易产生评分者一致性和晕轮效应。本研究提出了使用单视和多视角视频评估ETI技能的框架。该框架包括两个阶段。第一阶段，2D卷积自动编码器（AE）和预训练的自监督网络从视频中提取特征。第二阶段，一个具有跨视注意力模块的1D卷积将AE的特征作为输入并输出技能评估的预测。ETI数据集分为两个阶段收集。在第一阶段，由专家和新手组成的两个受试者组执行ETI。在第二阶段，新手受试者在时间压力下执行ETI，结果是成功或失败。此外，还对专家和新手使用单个头戴式摄像机收集的视频数据进行了分析。研究在初始阶段实现了100%的准确率来识别专家/新手试验。在第二阶段，模型在分类成功/失败程序方面展示了85%的准确性。使用仅头部佩戴式摄像机，模型在专家和新手分类上的准确率分别为96%，而分类成功和失败时的准确率分别为85%。此外，还提出了GradCAM，用于解释专家和新手行为以及成功/失败试验之间的差异。该方法提供了一种可靠且客观的自动评估ETI技能的方法。

URL

https://arxiv.org/abs/2404.11727

PDF

https://arxiv.org/pdf/2404.11727.pdf

Deep Learning for Video-Based Assessment of Endotracheal Intubation Skills

Abstract

Abstract (translated)

URL

PDF Copy

PDF