Abstract
This work focuses on the efficiency of the knowledge distillation approach in generating a lightweight yet powerful BERT based model for natural language processing applications. After the model creation, we applied the resulting model, LastBERT, to a real-world task classifying severity levels of Attention Deficit Hyperactivity Disorder (ADHD)-related concerns from social media text data. Referring to LastBERT, a customized student BERT model, we significantly lowered model parameters from 110 million BERT base to 29 million, resulting in a model approximately 73.64% smaller. On the GLUE benchmark, comprising paraphrase identification, sentiment analysis, and text classification, the student model maintained strong performance across many tasks despite this reduction. The model was also used on a real-world ADHD dataset with an accuracy and F1 score of 85%. When compared to DistilBERT (66M) and ClinicalBERT (110M), LastBERT demonstrated comparable performance, with DistilBERT slightly outperforming it at 87%, and ClinicalBERT achieving 86% across the same metrics. These findings highlight the LastBERT model's capacity to classify degrees of ADHD severity properly, so it offers a useful tool for mental health professionals to assess and comprehend material produced by users on social networking platforms. The study emphasizes the possibilities of knowledge distillation to produce effective models fit for use in resource-limited conditions, hence advancing NLP and mental health diagnosis. Furthermore underlined by the considerable decrease in model size without appreciable performance loss is the lower computational resources needed for training and deployment, hence facilitating greater applicability. Especially using readily available computational tools like Google Colab. This study shows the accessibility and usefulness of advanced NLP methods in pragmatic world applications.
Abstract (translated)
这项工作主要关注知识蒸馏方法在生成轻量级但强大的基于BERT的自然语言处理模型方面的效率。模型创建后,我们将产生的模型LastBERT应用于一个实际任务——从社交媒体文本数据中分类注意力缺陷多动障碍(ADHD)相关问题的严重程度。参考定制的学生BERT模型LastBERT,我们显著减少了模型参数数量,从BERT基础版的1.1亿减少到2900万,使模型大小大约缩小了73.64%。在包含同义句识别、情感分析和文本分类任务的GLUE基准测试中,尽管进行了这种削减,学生模型依然保持了强大的性能。该模型还被应用于一个实际的ADHD数据集,准确率和F1分数达到85%。与DistilBERT(6600万参数)和ClinicalBERT(1.1亿参数)相比,LastBERT展示了相当的性能表现,其中DistilBERT以87%稍胜一筹,而ClinicalBERT则实现了86%的成绩。这些发现突显了LastBERT模型正确分类ADHD严重程度的能力,因此为精神健康专业人员提供了一种有用的工具,帮助他们评估和理解社交媒体平台上用户生成的内容。该研究强调了知识蒸馏在生产适合资源受限条件下使用的有效模型方面的可能性,从而推进自然语言处理和心理健康诊断的进步。此外,显著减少模型大小而未伴随性能下降也表明所需的计算资源大大降低,便于训练和部署,特别是通过使用诸如Google Colab这样的现成计算工具。这项研究展示了先进NLP方法在实际应用中的可访问性和实用性。
URL
https://arxiv.org/abs/2411.00052