Abstract
Various representation learning methods for molecular structures have been devised to accelerate data-driven chemistry. However, the representation capabilities of existing methods are essentially limited to atom-level information, which is not sufficient to describe real-world molecular physics. Although electron-level information can provide fundamental knowledge about chemical compounds beyond the atom-level information, obtaining the electron-level information in real-world molecules is computationally impractical and sometimes infeasible. We propose a method for learning electron-informed molecular representations without additional computation costs by transferring readily accessible electron-level information about small molecules to large molecules of our interest. The proposed method achieved state-of-the-art prediction accuracy on extensive benchmark datasets containing experimentally observed molecular physics. The source code for HEDMoL is available at this https URL.
Abstract (translated)
为了加速数据驱动的化学研究,人们设计了多种分子结构表示学习方法。然而,现有方法的表现能力基本上仅限于原子层面的信息,这不足以描述现实世界中的分子物理特性。虽然电子层面的信息可以提供超出原子层面信息的基本知识来理解化合物,但在实际分子中获取这些电子层面的信息在计算上是不切实际甚至是不可能的。 我们提出了一种学习带有电子信息的分子表示的方法,在无需额外计算成本的情况下,通过将易于获取的小分子中的电子层面信息转移到感兴趣的大型分子上来实现这一点。我们的方法在包含实验观察到的分子物理特性的广泛基准数据集上的预测准确性达到了最先进的水平。HEDMoL的源代码可在[此处](https://this https URL)访问。 (注意:实际访问链接时请将"[此处](https://this https URL)"替换为正确的URL地址)
URL
https://arxiv.org/abs/2602.07087