Paper Reading AI Learner

Engagement Measurement Based on Facial Landmarks and Spatial-Temporal Graph Convolutional Networks

2024-03-25 20:43:23
Ali Abedi, Shehroz S. Khan

Abstract

Engagement in virtual learning is crucial for a variety of factors including learner satisfaction, performance, and compliance with learning programs, but measuring it is a challenging task. There is therefore considerable interest in utilizing artificial intelligence and affective computing to measure engagement in natural settings as well as on a large scale. This paper introduces a novel, privacy-preserving method for engagement measurement from videos. It uses facial landmarks, which carry no personally identifiable information, extracted from videos via the MediaPipe deep learning solution. The extracted facial landmarks are fed to a Spatial-Temporal Graph Convolutional Network (ST-GCN) to output the engagement level of the learner in the video. To integrate the ordinal nature of the engagement variable into the training process, ST-GCNs undergo training in a novel ordinal learning framework based on transfer learning. Experimental results on two video student engagement measurement datasets show the superiority of the proposed method compared to previous methods with improved state-of-the-art on the EngageNet dataset with a %3.1 improvement in four-class engagement level classification accuracy and on the Online Student Engagement dataset with a %1.5 improvement in binary engagement classification accuracy. The relatively lightweight ST-GCN and its integration with the real-time MediaPipe deep learning solution make the proposed approach capable of being deployed on virtual learning platforms and measuring engagement in real time.

Abstract (translated)

在虚拟学习中进行参与度对于各种因素(包括学习者的满意度、表现和遵守学习计划)至关重要,但测量它是一个具有挑战性的任务。因此,人们对利用人工智能和情感计算从自然环境中测量参与度以及在大规模上测量参与度的兴趣很大。本文介绍了一种新的、隐私保护的从视频测量参与度的方法。它使用通过MediaPipe深度学习解决方案提取的视频中的面部特征点。提取的面部特征点通过Spatial-Temporal Graph Convolutional Network (ST-GCN)输出视频中学习者的参与度。为了将参与变量的顺序特性融入培训过程,ST-GCNs基于迁移学习在一种新颖的有序学习框架中进行训练。在两个视频学生参与度测量数据集上的实验结果表明,与之前的方法相比,所提出的方法在EngageNet数据集上提高了3.1%的四类参与度分类准确率,在在线学生参与度数据集上提高了1.5%的二分类参与度分类准确率。相对较轻的ST-GCN和其与实时MediaPipe深度学习解决方案的集成使得所提出的方法能够部署到虚拟学习平台上,实时测量参与度。

URL

https://arxiv.org/abs/2403.17175

PDF

https://arxiv.org/pdf/2403.17175.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot