Paper Reading AI Learner

Wavelet-Driven Masked Multiscale Reconstruction for PPG Foundation Models

2026-01-18 01:34:47
Megha Thukral, Cyrus Tanade, Simon A. Lee, Juhyeon Lee, Hao Zhou, Keum San Chun, Migyeong Gwak, Viswam Nathan, Md Mahbubur Rahman, Li Zhu, Mehrab Bin Morshed, Subramaniam Venkatraman, Sharanya Arcot Desai

Abstract

Wearable foundation models have the potential to transform digital health by learning transferable representations from large-scale biosignals collected in everyday settings. While recent progress has been made in large-scale pretraining, most approaches overlook the spectral structure of photoplethysmography (PPG) signals, wherein physiological rhythms unfold across multiple frequency bands. Motivated by the insight that many downstream health-related tasks depend on multi-resolution features spanning fine-grained waveform morphology to global rhythmic dynamics, we introduce Masked Multiscale Reconstruction (MMR) for PPG representation learning - a self-supervised pretraining framework that explicitly learns from hierarchical time-frequency scales of PPG data. The pretraining task is designed to reconstruct randomly masked out coefficients obtained from a wavelet-based multiresolution decomposition of PPG signals, forcing the transformer encoder to integrate information across temporal and spectral scales. We pretrain our model with MMR using ~17 million unlabeled 10-second PPG segments from ~32,000 smartwatch users. On 17 of 19 diverse health-related tasks, MMR trained on large-scale wearable PPG data improves over or matches state-of-the-art open-source PPG foundation models, time-series foundation models, and other self-supervised baselines. Extensive analysis of our learned embeddings and systematic ablations underscores the value of wavelet-based representations, showing that they capture robust and physiologically-grounded features. Together, these results highlight the potential of MMR as a step toward generalizable PPG foundation models.

Abstract (translated)

可穿戴基础模型有潜力通过从日常环境中收集的大规模生物信号中学习转移表示来变革数字健康。尽管在大规模预训练方面已经取得了一些进展,但大多数方法忽略了光体积描记图(PPG)信号中的频谱结构,在这种结构中生理节奏跨越多个频率带展开。鉴于许多下游与健康相关的任务依赖于从精细的波形形态到全局节律动态的多尺度特征,我们引入了Masked Multiscale Reconstruction (MMR)用于PPG表示学习——这是一个自我监督预训练框架,明确地从PPG数据的时间-频谱层级中进行学习。预训练的任务旨在重构通过基于小波的多分辨率分解获得并随机屏蔽掉的PPG信号系数,强制转换器编码器在时间与频谱尺度之间整合信息。 我们使用大约1700万个未标记的10秒PPG片段(来自约32,000名智能手表用户)对我们的模型进行MMR预训练。在19个多样化的健康相关任务中的17个,基于大规模可穿戴设备的PPG数据训练的MMR优于或匹敌现有的开源PPG基础模型、时间序列基础模型以及其他自我监督基线方法。我们学习到嵌入物和系统消融分析的广泛研究表明小波表示的价值所在,显示它们能够捕获稳健且生理学上可信的特征。 这些结果共同突显了MMR作为迈向通用化PPG基础模型一步潜力的重要价值。

URL

https://arxiv.org/abs/2601.12215

PDF

https://arxiv.org/pdf/2601.12215.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot