Paper Reading AI Learner

How Can I Improve? Using GPT to Highlight the Desired and Undesired Parts of Open-ended Responses

2024-05-01 02:59:10
Jionghao Lin, Eason Chen, Zeifei Han, Ashish Gurung, Danielle R. Thomas, Wei Tan, Ngoc Dang Nguyen, Kenneth R. Koedinger

Abstract

Automated explanatory feedback systems play a crucial role in facilitating learning for a large cohort of learners by offering feedback that incorporates explanations, significantly enhancing the learning process. However, delivering such explanatory feedback in real-time poses challenges, particularly when high classification accuracy for domain-specific, nuanced responses is essential. Our study leverages the capabilities of large language models, specifically Generative Pre-Trained Transformers (GPT), to explore a sequence labeling approach focused on identifying components of desired and less desired praise for providing explanatory feedback within a tutor training dataset. Our aim is to equip tutors with actionable, explanatory feedback during online training lessons. To investigate the potential of GPT models for providing the explanatory feedback, we employed two commonly-used approaches: prompting and fine-tuning. To quantify the quality of highlighted praise components identified by GPT models, we introduced a Modified Intersection over Union (M-IoU) score. Our findings demonstrate that: (1) the M-IoU score effectively correlates with human judgment in evaluating sequence quality; (2) using two-shot prompting on GPT-3.5 resulted in decent performance in recognizing effort-based (M-IoU of 0.46) and outcome-based praise (M-IoU of 0.68); and (3) our optimally fine-tuned GPT-3.5 model achieved M-IoU scores of 0.64 for effort-based praise and 0.84 for outcome-based praise, aligning with the satisfaction levels evaluated by human coders. Our results show promise for using GPT models to provide feedback that focuses on specific elements in their open-ended responses that are desirable or could use improvement.

Abstract (translated)

自动解释性反馈系统在促进大规模学习群体的学习方面发挥了关键作用,通过提供包含解释性的反馈来显著提高学习过程。然而,在实时提供这样的解释性反馈方面存在挑战,特别是在对领域特定、细微的回应进行高分类准确度要求时。我们的研究利用大型语言模型的能力,特别是生成预训练转换器(GPT),探讨了一种专注于在导师训练数据集中的识别所需和不需要赞扬的组件的序列标注方法,为导师提供在线培训课程中的行动式、解释性反馈。我们的目标是向导师提供解释性反馈,以便在在线培训课程中进行。为了研究GPT模型的提供解释性反馈的潜力,我们采用了两种常用的方法:提示和微调。为了量化GPT模型确定的突出赞扬部分的品质,我们引入了modified Intersection over Union(M-IoU)分数。我们的研究结果表明: (1)M-IoU分数有效地与人类评价序列质量的程度相关; (2)在GPT-3.5上使用两击提示产生了 decent的性能,以识别基于努力(M-IoU为0.46)和基于结果(M-IoU为0.68)的赞扬; (3)我们通过微调GPT-3.5模型,实现了基于努力赞扬的M-IoU score为0.64和基于结果赞扬的M-IoU score为0.84,与人类编码者评估的水平相符。 我们的研究结果表明,使用GPT模型提供关注其开放性回应中具体元素的反馈具有前景。

URL

https://arxiv.org/abs/2405.00291

PDF

https://arxiv.org/pdf/2405.00291.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot