Paper Reading AI Learner

Larger models yield better results? Streamlined severity classification of ADHD-related concerns using BERT-based knowledge distillation

2024-10-30 17:57:44
Ahmed Akib Jawad Karim, Kazi Hafiz Md. Asad, Md. Golam Rabiul Alam

Abstract

This work focuses on the efficiency of the knowledge distillation approach in generating a lightweight yet powerful BERT based model for natural language processing applications. After the model creation, we applied the resulting model, LastBERT, to a real-world task classifying severity levels of Attention Deficit Hyperactivity Disorder (ADHD)-related concerns from social media text data. Referring to LastBERT, a customized student BERT model, we significantly lowered model parameters from 110 million BERT base to 29 million, resulting in a model approximately 73.64% smaller. On the GLUE benchmark, comprising paraphrase identification, sentiment analysis, and text classification, the student model maintained strong performance across many tasks despite this reduction. The model was also used on a real-world ADHD dataset with an accuracy and F1 score of 85%. When compared to DistilBERT (66M) and ClinicalBERT (110M), LastBERT demonstrated comparable performance, with DistilBERT slightly outperforming it at 87%, and ClinicalBERT achieving 86% across the same metrics. These findings highlight the LastBERT model's capacity to classify degrees of ADHD severity properly, so it offers a useful tool for mental health professionals to assess and comprehend material produced by users on social networking platforms. The study emphasizes the possibilities of knowledge distillation to produce effective models fit for use in resource-limited conditions, hence advancing NLP and mental health diagnosis. Furthermore underlined by the considerable decrease in model size without appreciable performance loss is the lower computational resources needed for training and deployment, hence facilitating greater applicability. Especially using readily available computational tools like Google Colab. This study shows the accessibility and usefulness of advanced NLP methods in pragmatic world applications.

Abstract (translated)

这项工作主要关注知识蒸馏方法在生成轻量级但强大的基于BERT的自然语言处理模型方面的效率。模型创建后,我们将产生的模型LastBERT应用于一个实际任务——从社交媒体文本数据中分类注意力缺陷多动障碍(ADHD)相关问题的严重程度。参考定制的学生BERT模型LastBERT,我们显著减少了模型参数数量,从BERT基础版的1.1亿减少到2900万,使模型大小大约缩小了73.64%。在包含同义句识别、情感分析和文本分类任务的GLUE基准测试中,尽管进行了这种削减,学生模型依然保持了强大的性能。该模型还被应用于一个实际的ADHD数据集,准确率和F1分数达到85%。与DistilBERT(6600万参数)和ClinicalBERT(1.1亿参数)相比,LastBERT展示了相当的性能表现,其中DistilBERT以87%稍胜一筹,而ClinicalBERT则实现了86%的成绩。这些发现突显了LastBERT模型正确分类ADHD严重程度的能力,因此为精神健康专业人员提供了一种有用的工具,帮助他们评估和理解社交媒体平台上用户生成的内容。该研究强调了知识蒸馏在生产适合资源受限条件下使用的有效模型方面的可能性,从而推进自然语言处理和心理健康诊断的进步。此外,显著减少模型大小而未伴随性能下降也表明所需的计算资源大大降低,便于训练和部署,特别是通过使用诸如Google Colab这样的现成计算工具。这项研究展示了先进NLP方法在实际应用中的可访问性和实用性。

URL

https://arxiv.org/abs/2411.00052

PDF

https://arxiv.org/pdf/2411.00052.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot