Paper Reading AI Learner

Evaluating Large Language Models for Structured Science Summarization in the Open Research Knowledge Graph

2024-05-03 14:03:04
Vladyslav Nechakhin, Jennifer D'Souza, Steffen Eger

Abstract

Structured science summaries or research contributions using properties or dimensions beyond traditional keywords enhances science findability. Current methods, such as those used by the Open Research Knowledge Graph (ORKG), involve manually curating properties to describe research papers' contributions in a structured manner, but this is labor-intensive and inconsistent between the domain expert human curators. We propose using Large Language Models (LLMs) to automatically suggest these properties. However, it's essential to assess the readiness of LLMs like GPT-3.5, Llama 2, and Mistral for this task before application. Our study performs a comprehensive comparative analysis between ORKG's manually curated properties and those generated by the aforementioned state-of-the-art LLMs. We evaluate LLM performance through four unique perspectives: semantic alignment and deviation with ORKG properties, fine-grained properties mapping accuracy, SciNCL embeddings-based cosine similarity, and expert surveys comparing manual annotations with LLM outputs. These evaluations occur within a multidisciplinary science setting. Overall, LLMs show potential as recommendation systems for structuring science, but further finetuning is recommended to improve their alignment with scientific tasks and mimicry of human expertise.

Abstract (translated)

使用超越传统关键词的属性或维度来结构化科学摘要或研究贡献可以提高科学可查找性。目前的方法,如Open Research Knowledge Graph (ORKG)中所使用的,需要手动编辑属性以描述研究论文的贡献,但这是劳动密集型且与领域专家人类编者之间存在不一致性。我们提出使用大型语言模型(LLMs)来自动建议这些属性。然而,在应用之前评估LLMs(如GPT-3.5、Llama 2和Mistral)的准备情况至关重要。 我们的研究对ORKG手动编辑的属性和上述最先进的LLM生成的属性进行了全面比较分析。我们通过四个独特的视角来评估LLM性能:语义对齐和与ORKG属性之间的偏移,细粒度属性映射准确度,基于SciNCL嵌入的余弦相似度,以及专家调查与LLM输出之间的比较。这些评估发生在多学科科学环境中。 总的来说,LLMs在构建科学推荐系统方面具有潜力,但需要进一步的微调以改善其与科学任务的同步性和模拟人类专业知识的能力。

URL

https://arxiv.org/abs/2405.02105

PDF

https://arxiv.org/pdf/2405.02105.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot