Paper Reading AI Learner

Extracting ORR Catalyst Information for Fuel Cell from Scientific Literature

2025-07-10 07:35:12
Hein Htet, Amgad Ahmed Ali Ibrahim, Yutaka Sasaki, Ryoji Asahi

Abstract

The oxygen reduction reaction (ORR) catalyst plays a critical role in enhancing fuel cell efficiency, making it a key focus in material science research. However, extracting structured information about ORR catalysts from vast scientific literature remains a significant challenge due to the complexity and diversity of textual data. In this study, we propose a named entity recognition (NER) and relation extraction (RE) approach using DyGIE++ with multiple pre-trained BERT variants, including MatSciBERT and PubMedBERT, to extract ORR catalyst-related information from the scientific literature, which is compiled into a fuel cell corpus for materials informatics (FC-CoMIcs). A comprehensive dataset was constructed manually by identifying 12 critical entities and two relationship types between pairs of the entities. Our methodology involves data annotation, integration, and fine-tuning of transformer-based models to enhance information extraction accuracy. We assess the impact of different BERT variants on extraction performance and investigate the effects of annotation consistency. Experimental evaluations demonstrate that the fine-tuned PubMedBERT model achieves the highest NER F1-score of 82.19% and the MatSciBERT model attains the best RE F1-score of 66.10%. Furthermore, the comparison with human annotators highlights the reliability of fine-tuned models for ORR catalyst extraction, demonstrating their potential for scalable and automated literature analysis. The results indicate that domain-specific BERT models outperform general scientific models like BlueBERT for ORR catalyst extraction.

Abstract (translated)

氧还原反应(ORR)催化剂在提高燃料电池效率方面起着关键作用,因此成为了材料科学研究中的重点。然而,从大量的科学文献中提取关于ORR催化剂的结构化信息仍然是一个重大挑战,这主要是由于文本数据的复杂性和多样性所致。在此研究中,我们提出了一种使用DyGIE++和多种预训练BERT变体(包括MatSciBERT和PubMedBERT)进行命名实体识别(NER)与关系抽取(RE),以从科学文献中提取ORR催化剂相关信息的方法,并将这些信息整合到一个燃料电池材料信息语料库(FC-CoMIcs)中。我们手动构建了一个全面的数据集,该数据集中包含了12个关键实体和两个实体对之间的关系类型。我们的方法包括数据标注、集成以及基于转换器模型的微调,以提高信息提取精度。我们评估了不同BERT变体对提取性能的影响,并研究了注释一致性的影响。实验结果表明,经过微调后的PubMedBERT模型在NER方面取得了最高的F1值82.19%,而MatSciBERT模型则在RE方面达到了最佳的F1值66.10%。此外,与人工标注者的比较突显了这些细调模型用于提取ORR催化剂信息的可靠性,并展示了它们进行大规模自动化文献分析的巨大潜力。研究结果表明,在ORR催化剂提取方面,特定领域的BERT模型优于如BlueBERT等通用科学模型。

URL

https://arxiv.org/abs/2507.07499

PDF

https://arxiv.org/pdf/2507.07499.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot