Paper Reading AI Learner

From Model to Classroom: Evaluating Generated MCQs for Portuguese with Narrative and Difficulty Concerns

2025-06-18 16:19:46
Bernardo Leite, Henrique Lopes Cardoso, Pedro Pinto, Abel Ferreira, Lu\'is Abreu, Isabel Rangel, Sandra Monteiro

Abstract

While MCQs are valuable for learning and evaluation, manually creating them with varying difficulty levels and targeted reading skills remains a time-consuming and costly task. Recent advances in generative AI provide an opportunity to automate MCQ generation efficiently. However, assessing the actual quality and reliability of generated MCQs has received limited attention -- particularly regarding cases where generation fails. This aspect becomes particularly important when the generated MCQs are meant to be applied in real-world settings. Additionally, most MCQ generation studies focus on English, leaving other languages underexplored. This paper investigates the capabilities of current generative models in producing MCQs for reading comprehension in Portuguese, a morphologically rich language. Our study focuses on generating MCQs that align with curriculum-relevant narrative elements and span different difficulty levels. We evaluate these MCQs through expert review and by analyzing the psychometric properties extracted from student responses to assess their suitability for elementary school students. Our results show that current models can generate MCQs of comparable quality to human-authored ones. However, we identify issues related to semantic clarity and answerability. Also, challenges remain in generating distractors that engage students and meet established criteria for high-quality MCQ option design.

Abstract (translated)

虽然选择题(MCQ)在学习和评估中很有价值,但手动创建不同难度层次且针对特定阅读技能的选择题仍然是一项耗时且成本高昂的任务。近年来,生成式人工智能的发展为高效自动化选择题的生成提供了机会。然而,关于生成的选择题的实际质量和可靠性问题却未得到足够的关注——尤其是在生成失败的情况下。当生成的选择题应用于真实场景中时,这一方面变得尤为重要。此外,大多数选择题生成研究都集中于英语领域,而其他语言则鲜有探索。本文探讨了当前生成式模型在葡萄牙语阅读理解中的选择题生成能力,作为一种形态丰富的语言。我们的研究重点是生成与课程相关的叙述元素相契合且涵盖不同难度层次的选择题,并通过专家评审和分析学生答案的心理测量属性来评估这些选择题是否适合小学教育。研究表明,目前的模型能够生成质量可与人工编写的选择题媲美的选择题;然而,我们发现了一些关于语义清晰度和作答可能性的问题。此外,在生成吸引学生的干扰项并满足高质量选择题选项设计标准方面仍存在挑战。

URL

https://arxiv.org/abs/2506.15598

PDF

https://arxiv.org/pdf/2506.15598.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot