Paper Reading AI Learner

RareGraph-Synth: Knowledge-Guided Diffusion Models for Generating Privacy-Preserving Synthetic Patient Trajectories in Ultra-Rare Diseases

2025-10-06 03:59:09
Khartik Uppalapati, Shakeel Abdulkareem, Bora Yimenicioglu

Abstract

We propose RareGraph-Synth, a knowledge-guided, continuous-time diffusion framework that generates realistic yet privacy-preserving synthetic electronic-health-record (EHR) trajectories for ultra-rare diseases. RareGraph-Synth unifies five public resources: Orphanet/Orphadata, the Human Phenotype Ontology (HPO), the GARD rare-disease KG, PrimeKG, and the FDA Adverse Event Reporting System (FAERS) into a heterogeneous knowledge graph comprising approximately 8 M typed edges. Meta-path scores extracted from this 8-million-edge KG modulate the per-token noise schedule in the forward stochastic differential equation, steering generation toward biologically plausible lab-medication-adverse-event co-occurrences while retaining score-based diffusion model stability. The reverse denoiser then produces timestamped sequences of lab-code, medication-code, and adverse-event-flag triples that contain no protected health information. On simulated ultra-rare-disease cohorts, RareGraph-Synth lowers categorical Maximum Mean Discrepancy by 40 percent relative to an unguided diffusion baseline and by greater than 60 percent versus GAN counterparts, without sacrificing downstream predictive utility. A black-box membership-inference evaluation using the DOMIAS attacker yields AUROC approximately 0.53, well below the 0.55 safe-release threshold and substantially better than the approximately 0.61 plus or minus 0.03 observed for non-KG baselines, demonstrating strong resistance to re-identification. These results suggest that integrating biomedical knowledge graphs directly into diffusion noise schedules can simultaneously enhance fidelity and privacy, enabling safer data sharing for rare-disease research.

Abstract (translated)

我们提出了一种名为RareGraph-Synth的框架,这是一种知识引导型、连续时间扩散模型,用于生成现实且保护隐私的人工电子健康记录(EHR)轨迹,以支持罕见病研究。RareGraph-Synth将五个公开资源——Orphanet/Orphadata、人类表型本体论(HPO)、GARD罕见疾病知识图谱(KG)、PrimeKG以及美国食品和药物管理局不良事件报告系统(FAERS),整合成一个异构的知识图,包含大约800万条类型化的边。从这个800万边的KG中提取元路径得分来调节前向随机微分方程中的每令牌噪声时间表,在生成过程中引导生物学上合理的实验室-药物-不良事件共现模式的同时保持评分扩散模型的稳定性。随后,反向去噪器将产生包含实验室代码、药物代码和不良事件标志三元组的时间戳序列,并确保不泄露任何受保护的健康信息。 在模拟罕见病队列中,与未经引导的扩散基准相比,RareGraph-Synth将类别的最大均值差异降低了40%,比GAN同类模型的降低幅度更是超过了60%,而不会牺牲下游预测效用。使用DOMIAS攻击者进行黑盒成员推理评估时,AUROC大约为0.53,远低于安全发布的阈值0.55,并且明显优于非KG基线中观察到的大约0.61±0.03,证明了对重新识别的强大抵抗力。 这些结果表明,直接将生物医学知识图集成到扩散噪声时间表中可以同时提高数据的真实性和隐私保护水平,从而为罕见病研究提供更安全的数据共享环境。

URL

https://arxiv.org/abs/2510.06267

PDF

https://arxiv.org/pdf/2510.06267.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot