Paper Reading AI Learner

SOS-1K: A Fine-grained Suicide Risk Classification Dataset for Chinese Social Media Analysis

2024-04-19 06:58:51
Hongzhi Qi, Hanfei Liu, Jianqiang Li, Qing Zhao, Wei Zhai, Dan Luo, Tian Yu He, Shuo Liu, Bing Xiang Yang, Guanghui Fu

Abstract

In the social media, users frequently express personal emotions, a subset of which may indicate potential suicidal tendencies. The implicit and varied forms of expression in internet language complicate accurate and rapid identification of suicidal intent on social media, thus creating challenges for timely intervention efforts. The development of deep learning models for suicide risk detection is a promising solution, but there is a notable lack of relevant datasets, especially in the Chinese context. To address this gap, this study presents a Chinese social media dataset designed for fine-grained suicide risk classification, focusing on indicators such as expressions of suicide intent, methods of suicide, and urgency of timing. Seven pre-trained models were evaluated in two tasks: high and low suicide risk, and fine-grained suicide risk classification on a level of 0 to 10. In our experiments, deep learning models show good performance in distinguishing between high and low suicide risk, with the best model achieving an F1 score of 88.39%. However, the results for fine-grained suicide risk classification were still unsatisfactory, with an weighted F1 score of 50.89%. To address the issues of data imbalance and limited dataset size, we investigated both traditional and advanced, large language model based data augmentation techniques, demonstrating that data augmentation can enhance model performance by up to 4.65% points in F1-score. Notably, the Chinese MentalBERT model, which was pre-trained on psychological domain data, shows superior performance in both tasks. This study provides valuable insights for automatic identification of suicidal individuals, facilitating timely psychological intervention on social media platforms. The source code and data are publicly available.

Abstract (translated)

在社交媒体上,用户经常表达个人情感,其中可能包括潜在的自杀倾向的一部分。互联网语言中隐含和多样形式的表达使准确和快速识别社交媒体上的自杀意图具有挑战性,从而为及时干预努力创造了困难。为识别自杀风险的发展深度学习模型是一个有前景的解决方案,但在中文背景下,相关数据明显不足。为了填补这一空白,本研究针对精细自杀风险分类的中国社交媒体数据集进行了研究,重点关注自杀意图的表现、自杀方法和时间的紧迫性等指标。在两个任务中评估了7个预训练模型:高自杀风险和低自杀风险,以及精细自杀风险分类级别为0到10。在我们的实验中,深度学习模型在区分高和低自杀风险方面表现良好,最佳模型达到88.39%的F1得分。然而,精细自杀风险分类的结果仍然不令人满意,权重在F1得分上的F1分数为50.89%。为了解决数据不平衡和数据集有限的问题,我们研究了传统和先进的大型语言模型数据增强技术,证明数据增强可以通过提高F1得分最多4.65个百分点来增强模型的性能。值得注意的是,在心理领域数据预训练的中文MentalBERT模型在两个任务中都表现出色。这项研究为自动识别自杀个体提供了宝贵的见解,促进了社交媒体平台上的及时心理干预。源代码和数据公开可用。

URL

https://arxiv.org/abs/2404.12659

PDF

https://arxiv.org/pdf/2404.12659.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot