Paper Reading AI Learner

Large Language Models Perform on Par with Experts Identifying Mental Health Factors in Adolescent Online Forums

2024-04-25 09:42:50
Isablle Lorge, Dam W. Joyce, Andrey Kormilitzin

Abstract

Mental health in children and adolescents has been steadily deteriorating over the past few years [ 1 ]. The recent advent of Large Language Models (LLMs) offers much hope for cost and time efficient scaling of monitoring and intervention, yet despite specifically prevalent issues such as school bullying and eating disorders, previous studies on have not investigated performance in this domain or for open information extraction where the set of answers is not predetermined. We create a new dataset of Reddit posts from adolescents aged 12-19 annotated by expert psychiatrists for the following categories: TRAUMA, PRECARITY, CONDITION, SYMPTOMS, SUICIDALITY and TREATMENT and compare expert labels to annotations from two top performing LLMs (GPT3.5 and GPT4). In addition, we create two synthetic datasets to assess whether LLMs perform better when annotating data as they generate it. We find GPT4 to be on par with human inter-annotator agreement and performance on synthetic data to be substantially higher, however we find the model still occasionally errs on issues of negation and factuality and higher performance on synthetic data is driven by greater complexity of real data rather than inherent advantage.

Abstract (translated)

近年来,儿童和青少年的心理健康状况一直在不断恶化[1]。大型语言模型的出现为成本和时间有效的监测和干预带来了很多希望。然而,尽管学校欺凌和饮食障碍等问题是以前研究的主要内容,但以前的研究没有调查这个领域或开放信息提取领域的表现。我们创建了一个由12-19岁青少年在专家精神科医生指导下标注的Reddit帖子的新数据集,以下是这个领域的专家标签和两个顶级LLM(GPT3.5和GPT4)的注释:TRAUMA,PRECARITY,CONDITION,SYMPTOMS,SUICIDALITY和TREATMENT。此外,我们还创建了两个合成数据集来评估LLM在生成数据时注释数据的性能。我们发现GPT4在人类互注解一致性和性能上与LLM相当,而合成数据上的性能远高于LLM。然而,我们发现模型在否定和事实性问题上仍然偶尔犯错,并且 synthetic data上的高性能是由真实数据的复杂性而不是固有优势造成的。

URL

https://arxiv.org/abs/2404.16461

PDF

https://arxiv.org/pdf/2404.16461.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot