Paper Reading AI Learner

Cyber-Security Knowledge Graph Generation by Hierarchical Nonnegative Matrix Factorization

2024-03-24 16:30:05
Ryan Barron, Maksim E. Eren, Manish Bhattarai, Nicholas Solovyev, Kim Rasmussen, Boian S. Alexandrov, Charles Nicholas, Cynthia Matuszek

Abstract

Much of human knowledge in cybersecurity is encapsulated within the ever-growing volume of scientific papers. As this textual data continues to expand, the importance of document organization methods becomes increasingly crucial for extracting actionable insights hidden within large text datasets. Knowledge Graphs (KGs) serve as a means to store factual information in a structured manner, providing explicit, interpretable knowledge that includes domain-specific information from the cybersecurity scientific literature. One of the challenges in constructing a KG from scientific literature is the extraction of ontology from unstructured text. In this paper, we address this topic and introduce a method for building a multi-modal KG by extracting structured ontology from scientific papers. We demonstrate this concept in the cybersecurity domain. One modality of the KG represents observable information from the papers, such as the categories in which they were published or the authors. The second modality uncovers latent (hidden) patterns of text extracted through hierarchical and semantic non-negative matrix factorization (NMF), such as named entities, topics or clusters, and keywords. We illustrate this concept by consolidating more than two million scientific papers uploaded to arXiv into the cyber-domain, using hierarchical and semantic NMF, and by building a cyber-domain-specific KG.

Abstract (translated)

大量的人类知识都封装在日益增长的科学论文中。随着文本数据的不断扩展,文档组织方法变得越来越重要,以便从大型文本数据集中提取潜在的具有行动意义的信息。知识图谱(KGs)作为一种存储事实信息的结构化方式,提供了明确的、可解释的知识,包括网络安全科学文献中的领域特定信息。构建KG从科学文献中的一大挑战是提取语义信息。在本文中,我们解决了这个问题,并引入了一种从科学论文中提取结构化语义的方法。我们在网络安全领域演示了这一概念。KG的一个模式代表来自论文的可观察信息,如它们所发表的分类或作者。另一个模式揭示了通过分层和语义非负矩阵分解(NMF)提取的潜在(隐藏)文本模式,例如命名实体、主题或聚类,以及关键词。我们通过使用分层和语义NMF将arXiv上超过2000万篇科学论文汇总到网络安全领域,并构建了一个网络安全领域特定的KG,来阐明这一概念。

URL

https://arxiv.org/abs/2403.16222

PDF

https://arxiv.org/pdf/2403.16222.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot