Abstract
Individual feedback can help students improve their essay writing skills. However, the manual effort required to provide such feedback limits individualization in practice. Automatically-generated essay feedback may serve as an alternative to guide students at their own pace, convenience, and desired frequency. Large language models (LLMs) have demonstrated strong performance in generating coherent and contextually relevant text. Yet, their ability to provide helpful essay feedback is unclear. This work explores several prompting strategies for LLM-based zero-shot and few-shot generation of essay feedback. Inspired by Chain-of-Thought prompting, we study how and to what extent automated essay scoring (AES) can benefit the quality of generated feedback. We evaluate both the AES performance that LLMs can achieve with prompting only and the helpfulness of the generated essay feedback. Our results suggest that tackling AES and feedback generation jointly improves AES performance. However, while our manual evaluation emphasizes the quality of the generated essay feedback, the impact of essay scoring on the generated feedback remains low ultimately.
Abstract (translated)
个人反馈有助于提高学生的论文写作技能。然而,提供这样的反馈需要消耗大量的努力,因此在实践中很难实现个性化。自动生成的论文反馈可以作为指导学生自行 pace、convenience 和 desired frequency 的替代方案。大型语言模型(LLMs)已经在生成连贯且上下文相关的文本方面表现出强大的性能。然而,它们提供有帮助的论文反馈的能力仍然不清楚。本研究探讨了基于LLM的零 shot 和零 shot 生成论文反馈的几种提示策略。受到 Chain-of-Thought 提示的启发,我们研究了自动评分(AES)在生成反馈质量方面的优势和程度。我们评估了LLM仅通过提示所能达到的AES性能以及生成的论文反馈的有用性。我们的结果表明,联合处理AES和反馈生成可以提高AES性能。然而,尽管我们的手动评估强调了生成的论文反馈的质量,但论文评分对生成的反馈的影响仍然较低。
URL
https://arxiv.org/abs/2404.15845