Abstract
The widespread adoption of electric vehicles (EVs) poses several challenges to power distribution networks and smart grid infrastructure due to the possibility of significantly increasing electricity demands, especially during peak hours. Furthermore, when EVs participate in demand-side management programs, charging expenses can be reduced by using optimal charging control policies that fully utilize real-time pricing schemes. However, devising optimal charging methods and control strategies for EVs is challenging due to various stochastic and uncertain environmental factors. Currently, most EV charging controllers operate based on a centralized model. In this paper, we introduce a novel approach for distributed and cooperative charging strategy using a Multi-Agent Reinforcement Learning (MARL) framework. Our method is built upon the Deep Deterministic Policy Gradient (DDPG) algorithm for a group of EVs in a residential community, where all EVs are connected to a shared transformer. This method, referred to as CTDE-DDPG, adopts a Centralized Training Decentralized Execution (CTDE) approach to establish cooperation between agents during the training phase, while ensuring a distributed and privacy-preserving operation during execution. We theoretically examine the performance of centralized and decentralized critics for the DDPG-based MARL implementation and demonstrate their trade-offs. Furthermore, we numerically explore the efficiency, scalability, and performance of centralized and decentralized critics. Our theoretical and numerical results indicate that, despite higher policy gradient variances and training complexity, the CTDE-DDPG framework significantly improves charging efficiency by reducing total variation by approximately %36 and charging cost by around %9.1 on average...
Abstract (translated)
电动汽车(EVs)的广泛采用对电力配电网络和智能电网基础设施提出了几个挑战,尤其是在高峰时段,可能会显著增加电力需求。此外,当EV参与需求侧管理计划时,通过利用最优充电控制策略,可以降低充电费用,这些策略完全利用实时定价方案。然而,由于各种随机和不确定的环境因素,设计电动汽车的优化充电方法和控制策略具有挑战性。目前,大多数电动汽车充电控制器基于集中模型。在本文中,我们提出了使用多智能体强化学习(MARL)框架的分布式和合作充电策略。我们的方法基于一个住宅社区中一组EV的集中训练和分散执行(CTDE)方法。该方法被称为CTDE-DDPG,在训练阶段采用集中训练、分散执行的方式建立代理之间的合作,同时在执行阶段确保分布隐私。我们理论探讨了基于DDPG的MARL实现的中央化和分散化的批评器的性能,并展示了它们的权衡。此外,我们通过数值研究探讨了中央化和分散化的批评器的效率、可扩展性和性能。我们的理论和数值结果表明,尽管政策梯度变异性较高和训练复杂性较高,但CTDE-DDPG框架通过降低总方差约36%和充电成本约9.1%的方式显著提高了充电效率。
URL
https://arxiv.org/abs/2404.12520