Paper Reading AI Learner

The Potential of LLMs in Medical Education: Generating Questions and Answers for Qualification Exams

2024-10-31 09:33:37
Yunqi Zhu, Wen Tang, Ying Sun, Xuebing Yang

Abstract

Recent research on large language models (LLMs) has primarily focused on their adaptation and application in specialized domains. The application of LLMs in the medical field is mainly concentrated on tasks such as the automation of medical report generation, summarization, diagnostic reasoning, and question-and-answer interactions between doctors and patients. The challenge of becoming a good teacher is more formidable than that of becoming a good student, and this study pioneers the application of LLMs in the field of medical education. In this work, we investigate the extent to which LLMs can generate medical qualification exam questions and corresponding answers based on few-shot prompts. Utilizing a real-world Chinese dataset of elderly chronic diseases, we tasked the LLMs with generating open-ended questions and answers based on a subset of sampled admission reports across eight widely used LLMs, including ERNIE 4, ChatGLM 4, Doubao, Hunyuan, Spark 4, Qwen, Llama 3, and Mistral. Furthermore, we engaged medical experts to manually evaluate these open-ended questions and answers across multiple dimensions. The study found that LLMs, after using few-shot prompts, can effectively mimic real-world medical qualification exam questions, whereas there is room for improvement in the correctness, evidence-based statements, and professionalism of the generated answers. Moreover, LLMs also demonstrate a decent level of ability to correct and rectify reference answers. Given the immense potential of artificial intelligence in the medical field, the task of generating questions and answers for medical qualification exams aimed at medical students, interns and residents can be a significant focus of future research.

Abstract (translated)

最近关于大型语言模型(LLMs)的研究主要集中在它们在专业领域的适应和应用上。LLMs 在医疗领域的应用主要集中于诸如医疗报告生成自动化、摘要、诊断推理以及医生与患者之间的问答互动等任务。成为一个好老师比成为一个好学生更具挑战性,这项研究开创了将 LLMs 应用于医学教育领域的新局面。在这项工作中,我们调查了基于少量样本提示(few-shot prompts),LLMs 能够生成多少医疗资格考试问题及其相应答案的程度。使用一个真实世界中的中文老年人慢性疾病数据集,我们将八种广泛使用的 LLMs(包括 ERNIE 4、ChatGLM 4、Doubao、Hunyuan、Spark 4、Qwen、Llama 3 和 Mistral)任务设定为根据采样的一部分入院报告生成开放式问题和答案。此外,我们还让医疗专家从多个维度手动评估这些开放式问题与答案的质量。研究发现,在使用少量样本提示后,LLMs 可以有效地模拟真实世界中的医疗资格考试问题,但在生成的答案的准确性、基于证据的陈述及专业性方面仍有改进空间。另外,LLMs 也显示出相当的能力去纠正和修改参考答案。鉴于人工智能在医疗领域巨大的潜力,为医学生、实习医生和住院医师生成医疗资格考试的问题与答案可以成为未来研究的重要方向。

URL

https://arxiv.org/abs/2410.23769

PDF

https://arxiv.org/pdf/2410.23769.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot