Paper Reading AI Learner

Memory-efficient Low-latency Remote Photoplethysmography through Temporal-Spatial State Space Duality

2025-04-02 14:34:04
Kegang Wang, Jiankai Tang, Yuxuan Fan, Jiatong Ji, Yuanchun Shi, Yuntao Wang

Abstract

Remote photoplethysmography (rPPG), enabling non-contact physiological monitoring through facial light reflection analysis, faces critical computational bottlenecks as deep learning introduces performance gains at the cost of prohibitive resource demands. This paper proposes ME-rPPG, a memory-efficient algorithm built on temporal-spatial state space duality, which resolves the trilemma of model scalability, cross-dataset generalization, and real-time constraints. Leveraging a transferable state space, ME-rPPG efficiently captures subtle periodic variations across facial frames while maintaining minimal computational overhead, enabling training on extended video sequences and supporting low-latency inference. Achieving cross-dataset MAEs of 5.38 (MMPD), 0.70 (VitalVideo), and 0.25 (PURE), ME-rPPG outperforms all baselines with improvements ranging from 21.3% to 60.2%. Our solution enables real-time inference with only 3.6 MB memory usage and 9.46 ms latency -- surpassing existing methods by 19.5%-49.7% accuracy and 43.2% user satisfaction gains in real-world deployments. The code and demos are released for reproducibility on this https URL.

Abstract (translated)

远程光容积描记法(rPPG)通过面部光线反射分析实现非接触式生理监测,但随着深度学习带来的性能提升,也面临着资源需求过高的计算瓶颈。本文提出了一种基于时间-空间状态空间二元性的内存高效算法ME-rPPG,解决了模型可扩展性、跨数据集泛化能力和实时约束之间的三难问题。利用可转移的状态空间,ME-rPPG能够有效地捕捉面部帧间的细微周期变化,并保持最低的计算开销,从而支持长时间视频序列训练和低延迟推理。 在跨数据集均方误差(MAE)测试中,ME-rPPG分别达到了MMPD 5.38、VitalVideo 0.70 和 PURE 0.25 的成绩,并且超越了所有基线模型,改进幅度从21.3%到60.2%不等。我们的解决方案能够在仅使用3.6 MB内存和延迟9.46毫秒的情况下实现实时推理,在现实世界部署中比现有方法提高了19.5%-49.7%的准确率,并带来了用户满意度提升的43.2%。 代码与演示可在以下网址找到,以确保研究结果的可重复性:[URL](请将[URL]替换为实际链接)。

URL

https://arxiv.org/abs/2504.01774

PDF

https://arxiv.org/pdf/2504.01774.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot