Paper Reading AI Learner

Explainable Automatic Grading with Neural Additive Models

2024-05-01 12:56:14
Aubrey Condor, Zachary Pardos

Abstract

The use of automatic short answer grading (ASAG) models may help alleviate the time burden of grading while encouraging educators to frequently incorporate open-ended items in their curriculum. However, current state-of-the-art ASAG models are large neural networks (NN) often described as "black box", providing no explanation for which characteristics of an input are important for the produced output. This inexplicable nature can be frustrating to teachers and students when trying to interpret, or learn from an automatically-generated grade. To create a powerful yet intelligible ASAG model, we experiment with a type of model called a Neural Additive Model that combines the performance of a NN with the explainability of an additive model. We use a Knowledge Integration (KI) framework from the learning sciences to guide feature engineering to create inputs that reflect whether a student includes certain ideas in their response. We hypothesize that indicating the inclusion (or exclusion) of predefined ideas as features will be sufficient for the NAM to have good predictive power and interpretability, as this may guide a human scorer using a KI rubric. We compare the performance of the NAM with another explainable model, logistic regression, using the same features, and to a non-explainable neural model, DeBERTa, that does not require feature engineering.

Abstract (translated)

使用自动短答案评分(ASAG)模型可能有助于减轻评分的时间负担,同时鼓励教育者频繁地将开放性问题融入他们的课程中。然而,目前最先进的支持自动评分(ASAG)模型的神经网络(NN)通常被称为“黑盒”,无法解释输入的特征对于产生的输出有何重要性。这种无法解释的性质可能会让教师和学生感到沮丧,当他们试图解释或从自动生成的分数中学习时。为了创建一个强大且易于理解的ASAG模型,我们尝试了一种名为神经附加模型(NAM)的模型,该模型将NN的性能与添加模型的可解释性相结合。我们使用学习科学中的知识整合(KI)框架来指导特征工程,以创建反映学生回答中是否包含特定思想的输入。我们假设,将预定义思想的包含(或排除)作为特征,将使NAM具有足够的预测力和可解释性,因为这将指导使用KI评分标准的人类评分者。我们使用相同的特征比较NAM与另一个可解释模型(逻辑回归)以及不需要特征工程的非可解释神经模型(DeBERTa)的表现。

URL

https://arxiv.org/abs/2405.00489

PDF

https://arxiv.org/pdf/2405.00489.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot