Paper Reading AI Learner

CollaboNet: collaboration of deep neural networks for biomedical named entity recognition

2019-05-29 16:34:50
Wonjin Yoon, Chan Ho So, Jinhyuk Lee, Jaewoo Kang

Abstract

Background: Finding biomedical named entities is one of the most essential tasks in biomedical text mining. Recently, deep learning-based approaches have been applied to biomedical named entity recognition (BioNER) and showed promising results. However, as deep learning approaches need an abundant amount of training data, a lack of data can hinder performance. BioNER datasets are scarce resources and each dataset covers only a small subset of entity types. Furthermore, many bio entities are polysemous, which is one of the major obstacles in named entity recognition. Results: To address the lack of data and the entity type misclassification problem, we propose CollaboNet which utilizes a combination of multiple NER models. In CollaboNet, models trained on a different dataset are connected to each other so that a target model obtains information from other collaborator models to reduce false positives. Every model is an expert on their target entity type and takes turns serving as a target and a collaborator model during training time. The experimental results show that CollaboNet can be used to greatly reduce the number of false positives and misclassified entities including polysemous words. CollaboNet achieved state-of-the-art performance in terms of precision, recall and F1 score. Conclusions: We demonstrated the benefits of combining multiple models for BioNER. Our model has successfully reduced the number of misclassified entities and improved the performance by leveraging multiple datasets annotated for different entity types. Given the state-of-the-art performance of our model, we believe that CollaboNet can improve the accuracy of downstream biomedical text mining applications such as bio-entity relation extraction.

Abstract (translated)

背景:寻找生物医学命名实体是生物医学文本挖掘中最基本的任务之一。近年来,基于深度学习的方法在生物医学命名实体识别(bioner)中得到了广泛的应用,并取得了良好的效果。然而,由于深度学习方法需要大量的培训数据,缺乏数据可能会阻碍绩效。Bioner数据集是稀缺资源,每个数据集只包含实体类型的一小部分。此外,许多生物实体是多义的,这是命名实体识别的主要障碍之一。结果:为了解决数据缺乏和实体类型错误分类的问题,我们提出了采用多个NER模型组合的collabonet。在collabonet中,在不同数据集上训练的模型彼此连接,以便目标模型从其他合作者模型获取信息,以减少误报。每个模型都是其目标实体类型的专家,在培训期间轮流充当目标和合作者模型。实验结果表明,collabonet可以大大减少误报和误判实体(包括多义词)的数量。科拉博内在精确度、召回率和F1分数方面达到了最先进的表现。结论:我们证明了组合多个模型对Bioner的好处。我们的模型成功地减少了错误分类实体的数量,并通过利用为不同实体类型注释的多个数据集来提高性能。鉴于我们模型的先进性能,我们相信collabonet可以提高下游生物医学文本挖掘应用程序(如生物实体关系提取)的准确性。

URL

https://arxiv.org/abs/1809.07950

PDF

https://arxiv.org/pdf/1809.07950.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot