Abstract
Ensuring the safety and well-being of elderly and vulnerable populations in assisted living environments is a critical concern. Computer vision presents an innovative and powerful approach to predicting health risks through video monitoring, employing human action recognition (HAR) technology. However, real-time prediction of human actions with high performance and efficiency is a challenge. This research proposes a real-time human action recognition model that combines a deep learning model and a live video prediction and alert system, in order to predict falls, staggering and chest pain for residents in assisted living. Six thousand RGB video samples from the NTU RGB+D 60 dataset were selected to create a dataset with four classes: Falling, Staggering, Chest Pain, and Normal, with the Normal class comprising 40 daily activities. Transfer learning technique was applied to train four state-of-the-art HAR models on a GPU server, namely, UniFormerV2, TimeSformer, I3D, and SlowFast. Results of the four models are presented in this paper based on class-wise and macro performance metrics, inference efficiency, model complexity and computational costs. TimeSformer is proposed for developing the real-time human action recognition model, leveraging its leading macro F1 score (95.33%), recall (95.49%), and precision (95.19%) along with significantly higher inference throughput compared to the others. This research provides insights to enhance safety and health of the elderly and people with chronic illnesses in assisted living environments, fostering sustainable care, smarter communities and industry innovation.
Abstract (translated)
确保在辅助生活环境中老年人和弱势群体的安全与健康是至关重要的问题。计算机视觉通过视频监控提供了一种创新且强大的方法,利用人体行为识别(HAR)技术来预测健康风险。然而,实现高性能、高效率的实时人体动作预测是一个挑战。本研究提出了一种结合深度学习模型和实时视频预测及警报系统的实时人体行为识别模型,旨在预测辅助生活环境中居民的跌倒、踉跄和胸痛情况。从NTU RGB+D 60数据集中选择了六千个RGB视频样本,构建了一个包含四类(跌倒、踉跄、胸痛和正常)的数据集,其中“正常”类别包括40种日常活动。采用迁移学习技术在GPU服务器上训练了四种最先进的HAR模型:UniFormerV2、TimeSformer、I3D 和 SlowFast。本文根据各类别的性能指标(如F1分数)、宏观性能指标以及推理效率和模型复杂度及计算成本,展示了这四个模型的结果。基于其卓越的宏观F1得分(95.33%)、召回率(95.49%)和准确率(95.19%),并具有显著高于其他模型的推理吞吐量,TimeSformer被提议用于开发实时人体行为识别模型。这项研究为提高辅助生活环境中老年人及慢性病患者的安全与健康提供了见解,并促进了可持续护理、智能社区以及行业创新的发展。
URL
https://arxiv.org/abs/2503.18957