Abstract
An emerging trend on social media platforms is their use as safe spaces for peer support. Particularly in healthcare, where many medical conditions contain harsh stigmas, social media has become a stigma-free way to engage in dialogues regarding symptoms, treatments, and personal experiences. Many existing works have employed NLP algorithms to facilitate quantitative analysis of health trends. Notably absent from existing works are keyphrase extraction (KE) models for social health posts-a task crucial to discovering emerging public health trends. This paper presents a novel, theme-driven KE dataset, SuboxoPhrase, and a qualitative annotation scheme with an overarching goal of extracting targeted clinically-relevant keyphrases. To the best of our knowledge, this is the first study to design a KE schema for social media healthcare texts. To demonstrate the value of this approach, this study analyzes Reddit posts regarding medications for opioid use disorder, a paramount health concern worldwide. Additionally, we benchmark ten off-the-shelf KE models on our new dataset, demonstrating the unique extraction challenges in modeling user-generated health texts. The proposed theme-driven KE approach lays the foundation of future work on efficient, large-scale analysis of social health texts, allowing researchers to surface useful public health trends, patterns, and knowledge gaps.
Abstract (translated)
在社交媒体平台上,一个新兴趋势是将其用作 peer 支持的安全空间。特别是在医疗保健领域,许多医疗条件都存在强烈的负面声誉,因此社交媒体已经成为一种无负面声誉的关于症状、治疗和个人经验的对话方式。许多现有工作已经采用了自然语言处理算法,以促进对健康趋势的量化分析。然而,现有工作中值得注意的是,对于社交媒体健康帖子的关键字提取(KE)模型是必不可少的任务,这是发现新兴公共健康趋势的关键任务。本文提出了一种新的主题驱动的 KE 数据集、SuboxoPhrase,并提出了一种定性标注方案,总目标是提取针对临床相关的关键字。据我们所知,这是第一个设计 KE 模型框架来社交媒体医疗文本的研究。为了证明这种方法的价值,该研究分析了 Reddit 关于药物阿片滥用 disorder 的帖子,这是全球最重要的健康问题之一。此外,我们在我们的新数据集上基准了十个现有 KE 模型,以展示在建模用户生成健康文本时的独特提取挑战。提出的主题驱动的 KE 方法奠定了未来高效、大规模分析社交媒体健康文本的基础,从而使研究人员能够揭示有用的公共健康趋势、模式和知识差距。
URL
https://arxiv.org/abs/2301.11508