Abstract
The use of automatic short answer grading (ASAG) models may help alleviate the time burden of grading while encouraging educators to frequently incorporate open-ended items in their curriculum. However, current state-of-the-art ASAG models are large neural networks (NN) often described as "black box", providing no explanation for which characteristics of an input are important for the produced output. This inexplicable nature can be frustrating to teachers and students when trying to interpret, or learn from an automatically-generated grade. To create a powerful yet intelligible ASAG model, we experiment with a type of model called a Neural Additive Model that combines the performance of a NN with the explainability of an additive model. We use a Knowledge Integration (KI) framework from the learning sciences to guide feature engineering to create inputs that reflect whether a student includes certain ideas in their response. We hypothesize that indicating the inclusion (or exclusion) of predefined ideas as features will be sufficient for the NAM to have good predictive power and interpretability, as this may guide a human scorer using a KI rubric. We compare the performance of the NAM with another explainable model, logistic regression, using the same features, and to a non-explainable neural model, DeBERTa, that does not require feature engineering.
Abstract (translated)
使用自动短答案评分(ASAG)模型可能有助于减轻评分的时间负担,同时鼓励教育者频繁地将开放性问题融入他们的课程中。然而,目前最先进的支持自动评分(ASAG)模型的神经网络(NN)通常被称为“黑盒”,无法解释输入的特征对于产生的输出有何重要性。这种无法解释的性质可能会让教师和学生感到沮丧,当他们试图解释或从自动生成的分数中学习时。为了创建一个强大且易于理解的ASAG模型,我们尝试了一种名为神经附加模型(NAM)的模型,该模型将NN的性能与添加模型的可解释性相结合。我们使用学习科学中的知识整合(KI)框架来指导特征工程,以创建反映学生回答中是否包含特定思想的输入。我们假设,将预定义思想的包含(或排除)作为特征,将使NAM具有足够的预测力和可解释性,因为这将指导使用KI评分标准的人类评分者。我们使用相同的特征比较NAM与另一个可解释模型(逻辑回归)以及不需要特征工程的非可解释神经模型(DeBERTa)的表现。
URL
https://arxiv.org/abs/2405.00489