Abstract
Designing molecular structures with desired chemical properties is an essential task in drug discovery and material design. However, finding molecules with the optimized desired properties is still a challenging task due to combinatorial explosion of candidate space of molecules. Here we propose a novel \emph{decomposition-and-reassembling} based approach, which does not include any optimization in hidden space and our generation process is highly interpretable. Our method is a two-step procedure: In the first decomposition step, we apply frequent subgraph mining to a molecular database to collect smaller size of subgraphs as building blocks of molecules. In the second reassembling step, we search desirable building blocks guided via reinforcement learning and combine them to generate new molecules. Our experiments show that not only can our method find better molecules in terms of two standard criteria, the penalized $\log P$ and drug-likeness, but also generate drug molecules with showing the valid intermediate molecules.
Abstract (translated)
在药物发现和材料设计中,设计具有所需化学性质的分子结构是一项至关重要的任务。然而,找到具有优化的所需化学性质的分子仍然是一项挑战性的任务,因为分子候选空间的组合数量巨大。在这里,我们提出了一种 novel emph{decomposition-and-reassembling} based approach,该方法在隐藏空间中没有进行优化,我们的生成过程具有很高的可解释性。我们的研究方法是两步过程:在第一个分解步骤中,我们使用频繁的核心子挖掘技术从一个分子数据库中收集较小的核心子作为分子 building blocks。在第二个重新组装步骤中,我们使用强化学习引导的方法来搜索我们希望构建的 building blocks,并将它们组合成新的分子。我们的实验结果表明,我们的方法不仅可以在两个标准标准 criteria(penalty $log P$ 和药物相似性)中找到更好的分子,还可以生成显示有效的中间分子的药物分子。
URL
https://arxiv.org/abs/2302.00587