Can GPT-4 do L2 analytic assessment?

2024-04-29 10:00:00

Stefano Bannò, Hari Krishna Vydana, Kate M. Knill, Mark J. F. Gales

arXiv_CL

arXiv_CL Relation Language_Model Transformer Zero-Shot Chat

Abstract
Abstract (translated)
URL
PDF

Abstract

Automated essay scoring (AES) to evaluate second language (L2) proficiency has been a firmly established technology used in educational contexts for decades. Although holistic scoring has seen advancements in AES that match or even exceed human performance, analytic scoring still encounters issues as it inherits flaws and shortcomings from the human scoring process. The recent introduction of large language models presents new opportunities for automating the evaluation of specific aspects of L2 writing proficiency. In this paper, we perform a series of experiments using GPT-4 in a zero-shot fashion on a publicly available dataset annotated with holistic scores based on the Common European Framework of Reference and aim to extract detailed information about their underlying analytic components. We observe significant correlations between the automatically predicted analytic scores and multiple features associated with the individual proficiency components.

Abstract (translated)

自动论文评分（AES）作为一种在教育环境中用于评估第二语言（L2）能力的成熟技术，已经确立了几十年。尽管整体评分在AES方面已经取得了与人类评分相匹敌甚至超越人类评分的进步，但分析评分仍然存在问题，因为它继承了人类评分过程的缺陷和不足。最近引入的大规模语言模型为自动化评估L2写作能力的具体方面提供了新的机会。在本文中，我们在基于整体评分公开可用的数据集上使用GPT-4进行零散实验，旨在提取关于其底层分析组件的详细信息。我们观察到自动预测的分析分数与多个与个人能力相关的特征之间存在显著的相关性。

URL

https://arxiv.org/abs/2404.18557

PDF

https://arxiv.org/pdf/2404.18557.pdf

Can GPT-4 do L2 analytic assessment?

Abstract

Abstract (translated)

URL

PDF Copy

PDF