Paper Reading AI Learner

COVID-19 Literature Mining and Retrieval using Text Mining Approaches

2022-05-29 22:34:19
Sanku Satya Uday, Satti Thanuja Pavani, T. Jaya Lakshmi, Rohit Chivukula

Abstract

The novel coronavirus disease (COVID-19) began in Wuhan, China, in late 2019 and to date has infected over 148M people worldwide, resulting in 3.12M deaths. On March 10, 2020, the World Health Organisation (WHO) declared it as a global pandemic. Many academicians and researchers started to publish papers describing the latest discoveries on covid-19. The large influx of publications made it hard for other researchers to go through a large amount of data and find the appropriate one that helps their research. So, the proposed model attempts to extract relavent titles from the large corpus of research publications which makes the job easy for the researchers. Allen Institute for AI released the CORD-19 dataset, which consists of 2,00,000 journal articles related to coronavirus-related research publications from PubMed's PMC, WHO (World Health Organization), bioRxiv, and medRxiv pre-prints. Along with this document corpus, they have also provided a topics dataset named topics-rnd3 consisting of a list of topics. Each topic has three types of representations like query, question, and narrative. These Datasets are made open for research, and also they released a TREC-COVID competition on Kaggle. Using these topics like queries, our goal is to find out the relevant documents in the CORD-19 dataset. In this research, relevant documents should be recognized for the posed topics in topics-rnd3 data set. The proposed model uses Natural Language Processing(NLP) techniques like Bag-of-Words, Average Word-2-Vec, Average BERT Base model and Tf-Idf weighted Word2Vec model to fabricate vectors for query, question, narrative, and combinations of them. Similarly, fabricate vectors for titles in the CORD-19 dataset. After fabricating vectors, cosine similarity is used for finding similarities between every two vectors. Cosine similarity helps us to find relevant documents for the given topic.

Abstract (translated)

URL

https://arxiv.org/abs/2205.14781

PDF

https://arxiv.org/pdf/2205.14781.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot