Shared learning of powertrain control policies for vehicle fleets

Abstract
Abstract (translated)
URL
PDF

Abstract

Emerging data-driven approaches, such as deep reinforcement learning (DRL), aim at on-the-field learning of powertrain control policies that optimize fuel economy and other performance metrics. Indeed, they have shown great potential in this regard for individual vehicles on specific routes or drive cycles. However, for fleets of vehicles that must service a distribution of routes, DRL approaches struggle with learning stability issues that result in high variances and challenge their practical deployment. In this paper, we present a novel framework for shared learning among a fleet of vehicles through the use of a distilled group policy as the knowledge sharing mechanism for the policy learning computations at each vehicle. We detail the mathematical formulation that makes this possible. Several scenarios are considered to analyze the functionality, performance, and computational scalability of the framework with fleet size. Comparisons of the cumulative performance of fleets using our proposed shared learning approach with a baseline of individual learning agents and another state-of-the-art approach with a centralized learner show clear advantages to our approach. For example, we find a fleet average asymptotic improvement of 8.5 percent in fuel economy compared to the baseline while also improving on the metrics of acceleration error and shifting frequency for fleets serving a distribution of suburban routes. Furthermore, we include demonstrative results that show how the framework reduces variance within a fleet and also how it helps individual agents adapt better to new routes.

Abstract (translated)

新兴的数据驱动方法，如深度强化学习（DRL），旨在在实况中学习动力电池控制策略，以优化燃料经济和其他性能指标。事实上，它们在个别车辆或特定的路线/驾驶周期方面已经表现出巨大的潜力。然而，对于必须为分布路线服务的车队，DRL方法在应对导致高方差的学习稳定性问题方面遇到困难，这使得它们的实际部署受到挑战。在本文中，我们提出了一个新颖的框架，通过在每辆车之间共享学习，实现车队内车辆之间的知识共享，以进行政策学习计算。我们详细介绍了实现这一目标的数学公式。在分析框架的功能、性能和计算可扩展性方面考虑了几种情景。与单独学习代理的基线和另一个最先进的集中学习方法进行比较，我们的共享学习方法展示了明显的优势。例如，我们发现在燃料经济方面，与基线相比，车队平均增益达到8.5％。同时，还改善了为郊区路线服务的车队的加速度误差和转移频率指标。此外，我们还包括了一些示例结果，展示了框架如何减少车队内的方差，以及如何帮助个体代理更好地适应新路线。

URL

https://arxiv.org/abs/2404.17892

PDF

https://arxiv.org/pdf/2404.17892.pdf

Shared learning of powertrain control policies for vehicle fleets

Abstract

Abstract (translated)

URL

PDF Copy

PDF