Paper Reading AI Learner

DiffSDS: A language diffusion model for protein backbone inpainting under geometric conditions and constraints

2023-01-22 05:07:54
Zhangyang Gao, Cheng Tan, Stan Z. Li

Abstract

Have you ever been troubled by the complexity and computational cost of SE(3) protein structure modeling and been amazed by the simplicity and power of language modeling? Recent work has shown promise in simplifying protein structures as sequences of protein angles; therefore, language models could be used for unconstrained protein backbone generation. Unfortunately, such simplification is unsuitable for the constrained protein inpainting problem, where the model needs to recover masked structures conditioned on unmasked ones, as it dramatically increases the computing cost of geometric constraints. To overcome this dilemma, we suggest inserting a hidden \textbf{a}tomic \textbf{d}irection \textbf{s}pace (\textbf{ADS}) upon the language model, converting invariant backbone angles into equivalent direction vectors and preserving the simplicity, called Seq2Direct encoder ($\text{Enc}_{s2d}$). Geometric constraints could be efficiently imposed on the newly introduced direction space. A Direct2Seq decoder ($\text{Dec}_{d2s}$) with mathematical guarantees is also introduced to develop a \textbf{SDS} ($\text{Enc}_{s2d}$+$\text{Dec}_{d2s}$) model. We apply the SDS model as the denoising neural network during the conditional diffusion process, resulting in a constrained generative model--\textbf{DiffSDS}. Extensive experiments show that the plug-and-play ADS could transform the language model into a strong structural model without loss of simplicity. More importantly, the proposed DiffSDS outperforms previous strong baselines by a large margin on the task of protein inpainting.

Abstract (translated)

你是否曾经因为 SE(3) 蛋白质结构建模的复杂性和计算成本而困扰,并且被语言建模的简洁性和力量所惊艳?最近的研究表明,通过将蛋白质角度序列作为序列编码,可以简化蛋白质结构。因此,语言模型可以用于无约束蛋白质主干生成。然而,这种简化并不适合有约束蛋白质涂色问题,因为模型需要在无遮挡结构的基础上恢复遮挡结构,这大大提高了几何约束的计算成本。为了克服这个困境,我们建议将隐藏的向量空间(ADS)附加到语言模型上,将不变的主干角度转换为相应的向量,并保留简单性,被称为 Seq2Direct编码器(Enc_{s2d})。 geometric 约束可以 efficiently 施加到新引入的方向空间上。同时,还引入了一个具有数学保证的Direct2Seq解码器(Dec_{d2s}),用于开发一个SDS模型(Enc_{s2d} + Dec_{d2s})。在条件扩散过程中,我们将SDS模型用作去噪神经网络,结果形成了一个有约束生成模型——DiffSDS。广泛的实验表明,插件式的 ADS 可以将语言模型转化为一个强大的结构模型,而不会丢失简单性。更重要的是,提出的 DiffSDS 在蛋白质涂色任务中比过去的强基准模型表现更好。

URL

https://arxiv.org/abs/2301.09642

PDF

https://arxiv.org/pdf/2301.09642.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot