Multi-modal Protein Knowledge Graph Construction and Applications

2022-09-17 09:35:07

Siyuan Cheng, Xiaozhuan Liang, Zhen Bi, Huajun Chen, Ningyu Zhang

arXiv_AI

Abstract
Abstract (translated)
URL
PDF

Abstract

Existing data-centric methods for protein science generally cannot sufficiently capture and leverage biology knowledge, which may be crucial for many protein tasks. To facilitate research in this field, we create ProteinKG65, a knowledge graph for protein science. Using gene ontology and Uniprot knowledge base as a basis, we transform and integrate various kinds of knowledge with aligned descriptions and protein sequences, respectively, to GO terms and protein entities. ProteinKG65 is mainly dedicated to providing a specialized protein knowledge graph, bringing the knowledge of Gene Ontology to protein function and structure prediction. The current version contains about 614,099 entities, 5,620,437 triples (including 5,510,437 protein-go triplets and 110,000 GO-GO triplets). We also illustrate the potential applications of ProteinKG65 with a prototype. Our dataset can be downloaded at this https URL.

Abstract (translated)

URL

https://arxiv.org/abs/2207.10080

PDF

https://arxiv.org/pdf/2207.10080.pdf