Abstract
While MCQs are valuable for learning and evaluation, manually creating them with varying difficulty levels and targeted reading skills remains a time-consuming and costly task. Recent advances in generative AI provide an opportunity to automate MCQ generation efficiently. However, assessing the actual quality and reliability of generated MCQs has received limited attention -- particularly regarding cases where generation fails. This aspect becomes particularly important when the generated MCQs are meant to be applied in real-world settings. Additionally, most MCQ generation studies focus on English, leaving other languages underexplored. This paper investigates the capabilities of current generative models in producing MCQs for reading comprehension in Portuguese, a morphologically rich language. Our study focuses on generating MCQs that align with curriculum-relevant narrative elements and span different difficulty levels. We evaluate these MCQs through expert review and by analyzing the psychometric properties extracted from student responses to assess their suitability for elementary school students. Our results show that current models can generate MCQs of comparable quality to human-authored ones. However, we identify issues related to semantic clarity and answerability. Also, challenges remain in generating distractors that engage students and meet established criteria for high-quality MCQ option design.
Abstract (translated)
虽然选择题(MCQ)在学习和评估中很有价值,但手动创建不同难度层次且针对特定阅读技能的选择题仍然是一项耗时且成本高昂的任务。近年来,生成式人工智能的发展为高效自动化选择题的生成提供了机会。然而,关于生成的选择题的实际质量和可靠性问题却未得到足够的关注——尤其是在生成失败的情况下。当生成的选择题应用于真实场景中时,这一方面变得尤为重要。此外,大多数选择题生成研究都集中于英语领域,而其他语言则鲜有探索。本文探讨了当前生成式模型在葡萄牙语阅读理解中的选择题生成能力,作为一种形态丰富的语言。我们的研究重点是生成与课程相关的叙述元素相契合且涵盖不同难度层次的选择题,并通过专家评审和分析学生答案的心理测量属性来评估这些选择题是否适合小学教育。研究表明,目前的模型能够生成质量可与人工编写的选择题媲美的选择题;然而,我们发现了一些关于语义清晰度和作答可能性的问题。此外,在生成吸引学生的干扰项并满足高质量选择题选项设计标准方面仍存在挑战。
URL
https://arxiv.org/abs/2506.15598