Paper Reading AI Learner

Go Beyond Black-box Policies: Rethinking the Design of Learning Agent for Interpretable and Verifiable HVAC Control

2024-02-29 22:42:23
Zhiyu An, Xianzhong Ding, Wan Du

Abstract

Recent research has shown the potential of Model-based Reinforcement Learning (MBRL) to enhance energy efficiency of Heating, Ventilation, and Air Conditioning (HVAC) systems. However, existing methods rely on black-box thermal dynamics models and stochastic optimizers, lacking reliability guarantees and posing risks to occupant health. In this work, we overcome the reliability bottleneck by redesigning HVAC controllers using decision trees extracted from existing thermal dynamics models and historical data. Our decision tree-based policies are deterministic, verifiable, interpretable, and more energy-efficient than current MBRL methods. First, we introduce a novel verification criterion for RL agents in HVAC control based on domain knowledge. Second, we develop a policy extraction procedure that produces a verifiable decision tree policy. We found that the high dimensionality of the thermal dynamics model input hinders the efficiency of policy extraction. To tackle the dimensionality challenge, we leverage importance sampling conditioned on historical data distributions, significantly improving policy extraction efficiency. Lastly, we present an offline verification algorithm that guarantees the reliability of a control policy. Extensive experiments show that our method saves 68.4% more energy and increases human comfort gain by 14.8% compared to the state-of-the-art method, in addition to an 1127x reduction in computation overhead. Our code and data are available at this https URL

Abstract (translated)

近年来,基于模型的强化学习(MBRL)在提高加热、通风和空调(HVAC)系统的能源效率方面具有潜在的应用价值。然而,现有的方法依赖于黑盒热动力学模型和随机优化器,缺乏可靠性保证和对乘客健康的风险。在这项工作中,我们通过从现有热动力学模型和历史数据中提取决策树来重新设计HVAC控制器,从而克服了可靠性瓶颈。我们的决策树基于策略是确定性、可验证、可解释的,并且比目前的MBRL方法更能源效率。首先,我们引入了一种基于领域知识的HVAC控制RL代理的验证标准。其次,我们开发了一种策略提取程序,可以生成可验证的决策树策略。我们发现,热动力学模型输入的高维度阻碍了策略提取的效率。为了应对维度挑战,我们利用历史数据分布进行条件概率采样,显著提高了策略提取效率。最后,我们提出了一个离线验证算法,确保控制策略的可靠性。大量实验证明,与最先进的方法相比,我们的方法可以节省68.4%的能源,提高人类舒适度14.8%,以及降低1127倍的计算开销。我们的代码和数据可在此处访问:https://www.xxxxxx.com

URL

https://arxiv.org/abs/2403.00172

PDF

https://arxiv.org/pdf/2403.00172.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot