Paper Reading AI Learner

Bridging Qualitative Rubrics and AI: A Binary Question Framework for Criterion-Referenced Grading in Engineering

2026-01-22 03:57:47
Lili Chen, Winn Wing-Yiu Chow, Stella Peng, Bencheng Fan, Sachitha Bandara

Abstract

PURPOSE OR GOAL: This study investigates how GenAI can be integrated with a criterion-referenced grading framework to improve the efficiency and quality of grading for mathematical assessments in engineering. It specifically explores the challenges demonstrators face with manual, model solution-based grading and how a GenAI-supported system can be designed to reliably identify student errors, provide high-quality feedback, and support human graders. The research also examines human graders' perceptions of the effectiveness of this GenAI-assisted approach. ACTUAL OR ANTICIPATED OUTCOMES: The study found that GenAI achieved an overall grading accuracy of 92.5%, comparable to two experienced human graders. The two researchers, who also served as subject demonstrators, perceived the GenAI as a helpful second reviewer that improved accuracy by catching small errors and provided more complete feedback than they could manually. A central outcome was the significant enhancement of formative feedback. However, they noted the GenAI tool is not yet reliable enough for autonomous use, especially with unconventional solutions. CONCLUSIONS/RECOMMENDATIONS/SUMMARY: This study demonstrates that GenAI, when paired with a structured, criterion-referenced framework using binary questions, can grade engineering mathematical assessments with an accuracy comparable to human experts. Its primary contribution is a novel methodological approach that embeds the generation of high-quality, scalable formative feedback directly into the assessment workflow. Future work should investigate student perceptions of GenAI grading and feedback.

Abstract (translated)

**研究目的或目标:** 本研究探讨了如何将通用人工智能(GenAI)与基于标准的评分框架相结合,以提高工程数学评估中的评分效率和质量。该研究特别关注手动评分者在使用模型解决方案进行评分时面临的挑战,并探索设计一个由GenAI支持的系统来可靠地识别学生错误、提供高质量反馈以及辅助人类评分者的可能性。此外,该研究还考察了人工评分员对这一基于GenAI的方法有效性的看法。 **实际或预期成果:** 研究表明,GenAI在数学评估中的总体评分准确率为92.5%,与两位经验丰富的手动评分者的表现相当。担任研究对象演示者的两名研究人员认为,GenAI可以作为有效的第二评审人来提高准确性,并提供比他们手动操作时更全面的反馈。其中一项重要成果是形成了更为有效的形成性反馈机制。然而,研究人员也指出,该GenAI工具尚未达到自主使用的可靠性水平,尤其是在处理非常规解决方案时。 **结论/建议/总结:** 本研究表明,在使用结构化、基于标准的方法(特别是采用二元问题)的情况下,将通用人工智能与工程数学评估的评分相结合可以实现与专家级人类评分员相当的准确度。GenAI的主要贡献在于提出了一种新颖的方法学方法,该方法直接在评估流程中嵌入高质量且可扩展的形成性反馈生成机制。未来的研究应该关注学生对基于GenAI的评分和反馈的看法。 --- 这项研究为利用通用人工智能改善工程数学评估中的自动评分提供了重要见解,并强调了进一步开发和完善这一技术以提高其可靠性和适用性的必要性。

URL

https://arxiv.org/abs/2601.15626

PDF

https://arxiv.org/pdf/2601.15626.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot