Paper Reading AI Learner

A Named Entity Recognition and Topic Modeling-based Solution for Locating and Better Assessment of Natural Disasters in Social Media

2024-05-01 23:19:49
Ayaz Mehmood, Muhammad Tayyab Zamir, Muhammad Asif Ayub, Nasir Ahmad, Kashif Ahmad

Abstract

Over the last decade, similar to other application domains, social media content has been proven very effective in disaster informatics. However, due to the unstructured nature of the data, several challenges are associated with disaster analysis in social media content. To fully explore the potential of social media content in disaster informatics, access to relevant content and the correct geo-location information is very critical. In this paper, we propose a three-step solution to tackling these challenges. Firstly, the proposed solution aims to classify social media posts into relevant and irrelevant posts followed by the automatic extraction of location information from the posts' text through Named Entity Recognition (NER) analysis. Finally, to quickly analyze the topics covered in large volumes of social media posts, we perform topic modeling resulting in a list of top keywords, that highlight the issues discussed in the tweet. For the Relevant Classification of Twitter Posts (RCTP), we proposed a merit-based fusion framework combining the capabilities of four different models namely BERT, RoBERTa, Distil BERT, and ALBERT obtaining the highest F1-score of 0.933 on a benchmark dataset. For the Location Extraction from Twitter Text (LETT), we evaluated four models namely BERT, RoBERTa, Distil BERTA, and Electra in an NER framework obtaining the highest F1-score of 0.960. For topic modeling, we used the BERTopic library to discover the hidden topic patterns in the relevant tweets. The experimental results of all the components of the proposed end-to-end solution are very encouraging and hint at the potential of social media content and NLP in disaster management.

Abstract (translated)

在过去的十年里,与其他应用领域一样,社交媒体内容在灾难信息学中已经被证明非常有效。然而,由于数据的无结构性质,社交媒体内容灾难分析面临着几个挑战。为了全面探索社交媒体内容在灾难信息学中的潜力,访问相关内容并获取正确的美地位置信息非常重要。在本文中,我们提出了一个解决这些挑战的三步解决方案。首先,所提出的解决方案旨在对社交媒体帖子进行分类,包括相关和不相关帖子,然后通过命名实体识别(NER)分析从帖子文本中自动提取位置信息。最后,为了快速分析大量社交媒体帖子中涵盖的主题,我们执行了主题建模,得到了一组关键词,突出了推特中讨论的问题。对于Twitter帖子相关分类(RCTP),我们提出了一个基于贡献的融合框架,结合了四种不同模型的功能,即BERT、RoBERTa、Distil BERT和ALBERT,在基准数据集上取得了最高F1分数为0.933。对于从Twitter文本中提取位置(LETT),我们在NER框架中评估了四种模型,即BERT、RoBERTa、Distil BERTA和Electra,取得了最高F1分数为0.960。对于主题建模,我们使用了BERTopic库来发现相关推特中的隐藏主题模式。所有组件的实验结果都非常鼓舞人心,表明了社交媒体内容和自然语言处理在灾难管理中的潜力。

URL

https://arxiv.org/abs/2405.00903

PDF

https://arxiv.org/pdf/2405.00903.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot