Paper Reading AI Learner

Atomas: Hierarchical Alignment on Molecule-Text for Unified Molecule Understanding and Generation

2024-04-23 12:35:44
Yikun Zhang, Geyan Ye, Chaohao Yuan, Bo Han, Long-Kai Huang, Jianhua Yao, Wei Liu, Yu Rong

Abstract

Molecule-and-text cross-modal representation learning has emerged as a promising direction for enhancing the quality of molecular representation, thereby improving performance in various scientific fields, including drug discovery and materials science. Existing studies adopt a global alignment approach to learn the knowledge from different modalities. These global alignment approaches fail to capture fine-grained information, such as molecular fragments and their corresponding textual description, which is crucial for downstream tasks. Furthermore, it is incapable to model such information using a similar global alignment strategy due to data scarcity of paired local part annotated data from existing datasets. In this paper, we propose Atomas, a multi-modal molecular representation learning framework to jointly learn representations from SMILES string and text. We design a Hierarchical Adaptive Alignment model to concurrently learn the fine-grained fragment correspondence between two modalities and align these representations of fragments in three levels. Additionally, Atomas's end-to-end training framework incorporates the tasks of understanding and generating molecule, thereby supporting a wider range of downstream tasks. In the retrieval task, Atomas exhibits robust generalization ability and outperforms the baseline by 30.8% of recall@1 on average. In the generation task, Atomas achieves state-of-the-art results in both molecule captioning task and molecule generation task. Moreover, the visualization of the Hierarchical Adaptive Alignment model further confirms the chemical significance of our approach. Our codes can be found at https://anonymous.4open.science/r/Atomas-03C3.

Abstract (translated)

分子和文本跨模态表示学习已成为提高分子表示质量的有前景的方向,从而在药物发现和材料科学等领域提高性能。现有研究采用全局对齐方法从不同模态中学习知识。然而,这些全局对齐方法无法捕捉到细粒度信息,例如分子片段及其相应的文本描述,这对下游任务至关重要。此外,由于现有数据集的配对局部部分注释数据较少,它无法使用类似的全局对齐策略来建模这些信息。在本文中,我们提出了Atomas,一种多模态分子表示学习框架,共同学习来自SMILES字符串和文本的表示。我们设计了一个等级适应性对齐模型,以同时学习两个模态中片段的细粒度对应关系,并将这些片段表示对齐到三个层次。此外,Atomas的端到端训练框架包括理解和解构分子的任务,从而支持更广泛的下游任务。在检索任务中,Atomas表现出稳健的泛化能力,平均比基线高30.8%的召回率。在生成任务中,Atomas在分子摘要任务和分子生成任务上实现最先进的结果。此外,层次结构适应性对齐模型的可视化进一步证实了我们的方法具有重要的化学意义。我们的代码可以在https://anonymous.4open.science/r/Atomas-03C3中找到。

URL

https://arxiv.org/abs/2404.16880

PDF

https://arxiv.org/pdf/2404.16880.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot