Abstract
Retrosynthesis, which aims to find a route to synthesize a target molecule from commercially available starting materials, is a critical task in drug discovery and materials design. Recently, the combination of ML-based single-step reaction predictors with multi-step planners has led to promising results. However, the single-step predictors are mostly trained offline to optimize the single-step accuracy, without considering complete routes. Here, we leverage reinforcement learning (RL) to improve the single-step predictor, by using a tree-shaped MDP to optimize complete routes while retaining single-step accuracy. Desirable routes should be both synthesizable and of low cost. We propose an online training algorithm, called Planning with Dual Value Networks (PDVN), in which two value networks predict the synthesizability and cost of molecules, respectively. To maintain the single-step accuracy, we design a two-branch network structure for the single-step predictor. On the widely-used USPTO dataset, our PDVN algorithm improves the search success rate of existing multi-step planners (e.g., increasing the success rate from 85.79% to 98.95% for Retro*, and reducing the number of model calls by half while solving 99.47% molecules for RetroGraph). Furthermore, PDVN finds shorter synthesis routes (e.g., reducing the average route length from 5.76 to 4.83 for Retro*, and from 5.63 to 4.78 for RetroGraph).
Abstract (translated)
Retrosynthesis 旨在从商业可用原材料中合成目标分子的 route,是药物发现和材料设计中的关键问题。最近,基于机器学习的一步反应预测器和多步规划器的结合取得了令人瞩目的结果。然而,一步反应预测器大多在离线状态下训练,以优化一步精度,而不考虑完整的路径。在这里,我们利用强化学习(RL)来提高一步反应预测器的性能,通过使用树形MDP优化完整的路径,同时保持一步精度。我们希望寻找既可以合成又可以低成本生产的路径。我们提出了一种在线训练算法,称为“ Planning with Dual Value Networks (PDVN)”,其中两个价值网络预测分子的合成性和成本。为了保持一步精度,我们为一步反应预测器设计了两个分支的网络结构。在我们广泛应用的USPTO数据集上,我们的PDVN算法提高了现有多步规划器的搜索成功率(例如, Retro*的成功率从85.79%增加到98.95%),同时减少了模型调用的数量,而 RetroGraph 解决的问题中分子的解决率从99.47%增加到99.4%。此外,PDVN 找到了更短的合成路径(例如, Retro*的平均路径长度从5.76降低到4.83, RetroGraph 的平均路径长度从5.63降低到4.78)。
URL
https://arxiv.org/abs/2301.13755