Abstract
The convergence of materials science and artificial intelligence has unlocked new opportunities for gathering, analyzing, and generating novel materials sourced from extensive scientific literature. Despite the potential benefits, persistent challenges such as manual annotation, precise extraction, and traceability issues remain. Large language models have emerged as promising solutions to address these obstacles. This paper introduces Functional Materials Knowledge Graph (FMKG), a multidisciplinary materials science knowledge graph. Through the utilization of advanced natural language processing techniques, extracting millions of entities to form triples from a corpus comprising all high-quality research papers published in the last decade. It organizes unstructured information into nine distinct labels, covering Name, Formula, Acronym, Structure/Phase, Properties, Descriptor, Synthesis, Characterization Method, Application, and Domain, seamlessly integrating papers' Digital Object Identifiers. As the latest structured database for functional materials, FMKG acts as a powerful catalyst for expediting the development of functional materials and a fundation for building a more comprehensive material knowledge graph using full paper text. Furthermore, our research lays the groundwork for practical text-mining-based knowledge management systems, not only in intricate materials systems but also applicable to other specialized domains.
Abstract (translated)
材料科学和人工智能的汇聚为收集、分析和生成来源于广泛科学文献的新材料提供了新的机会。尽管带来了潜在的好处,但持续的挑战,如手动注释、精确提取和可追溯性问题仍然存在。大型语言模型已成为解决这些障碍的有前景的解决方案。本文介绍了功能材料知识图(FMKG),一种跨学科的材料科学知识图。通过利用先进的自然语言处理技术,从包含过去十年发表的所有高质量研究论文的语料库中提取数百万个实体,形成三元组。它将无结构信息划分为九个不同的标签,覆盖名称、化学式、缩写、结构/相、性质、描述符、合成、表征方法、应用和领域,无缝集成论文的数字对象标识符。作为功能材料的最新结构数据库,FMKG在加速功能材料的开发和构建更全面的材料知识图中发挥着强大的促进作用。此外,我们的研究为实际基于文本挖掘的知识管理系统奠定了基础,不仅适用于复杂的材料系统,而且适用于其他专业领域。
URL
https://arxiv.org/abs/2404.03080