Paper Reading AI Learner

Aligning Biomedical Metadata with Ontologies Using Clustering and Embeddings

2019-03-19 18:31:23
Rafael S. Gonçalves, Maulik R. Kamdar, Mark A. Musen

Abstract

The metadata about scientific experiments published in online repositories have been shown to suffer from a high degree of representational heterogeneity---there are often many ways to represent the same type of information, such as a geographical location via its latitude and longitude. To harness the potential that metadata have for discovering scientific data, it is crucial that they be represented in a uniform way that can be queried effectively. One step toward uniformly-represented metadata is to normalize the multiple, distinct field names used in metadata (e.g., lat lon, lat and long) to describe the same type of value. To that end, we present a new method based on clustering and embeddings (i.e., vector representations of words) to align metadata field names with ontology terms. We apply our method to biomedical metadata by generating embeddings for terms in biomedical ontologies from the BioPortal repository. We carried out a comparative study between our method and the NCBO Annotator, which revealed that our method yields more and substantially better alignments between metadata and ontology terms.

Abstract (translated)

在线数据库中发布的有关科学实验的元数据显示出高度的表示异质性——通常有许多方法来表示同一类型的信息,例如通过经纬度来表示地理位置。为了利用元数据在发现科学数据方面的潜力,关键是要用一种统一的方式来表示它们,以便有效地查询。统一表示元数据的一个步骤是规范化元数据中使用的多个不同字段名(例如,lat-lon、lat和lon g),以描述同一类型的值。为此,我们提出了一种基于聚类和嵌入(即词汇的矢量表示)的元数据字段名与本体术语对齐的新方法。我们将我们的方法应用到生物医学元数据中,通过从生物门户存储库生成生物医学本体中术语的嵌入。我们对我们的方法和NCBO注释器进行了比较研究,结果表明我们的方法在元数据和本体术语之间产生了更多更好的一致性。

URL

https://arxiv.org/abs/1903.08206

PDF

https://arxiv.org/pdf/1903.08206.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot