Paper Reading AI Learner

Exploring LLM Prompting Strategies for Joint Essay Scoring and Feedback Generation

2024-04-24 12:48:06
Maja Stahl, Leon Biermann, Andreas Nehring, Henning Wachsmuth

Abstract

Individual feedback can help students improve their essay writing skills. However, the manual effort required to provide such feedback limits individualization in practice. Automatically-generated essay feedback may serve as an alternative to guide students at their own pace, convenience, and desired frequency. Large language models (LLMs) have demonstrated strong performance in generating coherent and contextually relevant text. Yet, their ability to provide helpful essay feedback is unclear. This work explores several prompting strategies for LLM-based zero-shot and few-shot generation of essay feedback. Inspired by Chain-of-Thought prompting, we study how and to what extent automated essay scoring (AES) can benefit the quality of generated feedback. We evaluate both the AES performance that LLMs can achieve with prompting only and the helpfulness of the generated essay feedback. Our results suggest that tackling AES and feedback generation jointly improves AES performance. However, while our manual evaluation emphasizes the quality of the generated essay feedback, the impact of essay scoring on the generated feedback remains low ultimately.

Abstract (translated)

个人反馈有助于提高学生的论文写作技能。然而,提供这样的反馈需要消耗大量的努力,因此在实践中很难实现个性化。自动生成的论文反馈可以作为指导学生自行 pace、convenience 和 desired frequency 的替代方案。大型语言模型(LLMs)已经在生成连贯且上下文相关的文本方面表现出强大的性能。然而,它们提供有帮助的论文反馈的能力仍然不清楚。本研究探讨了基于LLM的零 shot 和零 shot 生成论文反馈的几种提示策略。受到 Chain-of-Thought 提示的启发,我们研究了自动评分(AES)在生成反馈质量方面的优势和程度。我们评估了LLM仅通过提示所能达到的AES性能以及生成的论文反馈的有用性。我们的结果表明,联合处理AES和反馈生成可以提高AES性能。然而,尽管我们的手动评估强调了生成的论文反馈的质量,但论文评分对生成的反馈的影响仍然较低。

URL

https://arxiv.org/abs/2404.15845

PDF

https://arxiv.org/pdf/2404.15845.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot