Paper Reading AI Learner

Theme-driven Keyphrase Extraction from Social Media on Opioid Recovery

2023-01-27 03:00:46
William Romano, Omar Sharif, Madhusudan Basak, Joseph Gatto, Sarah Preum

Abstract

An emerging trend on social media platforms is their use as safe spaces for peer support. Particularly in healthcare, where many medical conditions contain harsh stigmas, social media has become a stigma-free way to engage in dialogues regarding symptoms, treatments, and personal experiences. Many existing works have employed NLP algorithms to facilitate quantitative analysis of health trends. Notably absent from existing works are keyphrase extraction (KE) models for social health posts-a task crucial to discovering emerging public health trends. This paper presents a novel, theme-driven KE dataset, SuboxoPhrase, and a qualitative annotation scheme with an overarching goal of extracting targeted clinically-relevant keyphrases. To the best of our knowledge, this is the first study to design a KE schema for social media healthcare texts. To demonstrate the value of this approach, this study analyzes Reddit posts regarding medications for opioid use disorder, a paramount health concern worldwide. Additionally, we benchmark ten off-the-shelf KE models on our new dataset, demonstrating the unique extraction challenges in modeling user-generated health texts. The proposed theme-driven KE approach lays the foundation of future work on efficient, large-scale analysis of social health texts, allowing researchers to surface useful public health trends, patterns, and knowledge gaps.

Abstract (translated)

在社交媒体平台上,一个新兴趋势是将其用作 peer 支持的安全空间。特别是在医疗保健领域,许多医疗条件都存在强烈的负面声誉,因此社交媒体已经成为一种无负面声誉的关于症状、治疗和个人经验的对话方式。许多现有工作已经采用了自然语言处理算法,以促进对健康趋势的量化分析。然而,现有工作中值得注意的是,对于社交媒体健康帖子的关键字提取(KE)模型是必不可少的任务,这是发现新兴公共健康趋势的关键任务。本文提出了一种新的主题驱动的 KE 数据集、SuboxoPhrase,并提出了一种定性标注方案,总目标是提取针对临床相关的关键字。据我们所知,这是第一个设计 KE 模型框架来社交媒体医疗文本的研究。为了证明这种方法的价值,该研究分析了 Reddit 关于药物阿片滥用 disorder 的帖子,这是全球最重要的健康问题之一。此外,我们在我们的新数据集上基准了十个现有 KE 模型,以展示在建模用户生成健康文本时的独特提取挑战。提出的主题驱动的 KE 方法奠定了未来高效、大规模分析社交媒体健康文本的基础,从而使研究人员能够揭示有用的公共健康趋势、模式和知识差距。

URL

https://arxiv.org/abs/2301.11508

PDF

https://arxiv.org/pdf/2301.11508.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot