Abstract
Enzyme engineering enables the modification of wild-type proteins to meet industrial and research demands by enhancing catalytic activity, stability, binding affinities, and other properties. The emergence of deep learning methods for protein modeling has demonstrated superior results at lower costs compared to traditional approaches such as directed evolution and rational design. In mutation effect prediction, the key to pre-training deep learning models lies in accurately interpreting the complex relationships among protein sequence, structure, and function. This study introduces a retrieval-enhanced protein language model for comprehensive analysis of native properties from sequence and local structural interactions, as well as evolutionary properties from retrieved homologous sequences. The state-of-the-art performance of the proposed ProtREM is validated on over 2 million mutants across 217 assays from an open benchmark (ProteinGym). We also conducted post-hoc analyses of the model's ability to improve the stability and binding affinity of a VHH antibody. Additionally, we designed 10 new mutants on a DNA polymerase and conducted wet-lab experiments to evaluate their enhanced activity at higher temperatures. Both in silico and experimental evaluations confirmed that our method provides reliable predictions of mutation effects, offering an auxiliary tool for biologists aiming to evolve existing enzymes. The implementation is publicly available at this https URL.
Abstract (translated)
酶工程通过增强催化活性、稳定性、结合亲和力和其他特性,使野生型蛋白质得以改造以满足工业和研究需求。深度学习方法在蛋白质建模中的出现已经证明,在成本较低的情况下相比传统方法(如定向进化和理性设计)取得了更优的结果。在预测突变效应方面,预训练深度学习模型的关键在于准确解读蛋白质序列、结构与功能之间的复杂关系。这项研究介绍了一种增强检索的蛋白质语言模型,用于全面分析从序列和局部结构相互作用中得出的固有特性,以及从检索到的同源序列中得出的进化特性。我们提出的ProtREM在超过200万个来自公开基准(ProteinGym)的突变体中进行了测试,并展示了其领先性能。我们也对模型提高VHH抗体稳定性和结合亲和力的能力进行了事后分析。此外,我们在一种DNA聚合酶上设计了10种新突变,并通过湿实验室实验评估它们在较高温度下的增强活性。无论是基于计算机模拟还是实验评价,都证实了我们的方法能够提供可靠的突变效应预测,为旨在进化现有酶的生物学家提供了辅助工具。该实现代码公开可用,请访问此链接:[https URL]。
URL
https://arxiv.org/abs/2410.21127