Paper Reading AI Learner

Improving Socratic Question Generation using Data Augmentation and Preference Optimization

2024-03-01 00:08:20
Nischal Ashok Kumar, Andrew Lan


The Socratic method is a way of guiding students toward solving a problem independently without directly revealing the solution to the problem. Although this method has been shown to significantly improve student learning outcomes, it remains a complex labor-intensive task for instructors. Large language models (LLMs) can be used to augment human effort by automatically generating Socratic questions for students. However, existing methods that involve prompting these LLMs sometimes produce invalid outputs, e.g., those that directly reveal the solution to the problem or provide irrelevant or premature questions. To alleviate this problem, inspired by reinforcement learning with AI feedback (RLAIF), we first propose a data augmentation method to enrich existing Socratic questioning datasets with questions that are invalid in specific ways. Next, we propose a method to optimize open-source LLMs such as LLama 2 to prefer ground-truth questions over generated invalid ones, using direct preference optimization (DPO). Our experiments on a Socratic questions dataset for student code debugging show that a DPO-optimized 7B LLama 2 model can effectively avoid generating invalid questions, and as a result, outperforms existing state-of-the-art prompting methods.

Abstract (translated)

苏格拉底教学法是一种引导学生独立解决问题的方式,而不直接揭示解决问题的方法。尽管这种方法已被证明在提高学生学习成果方面取得了显著的进展,但它仍然对教师来说是一个复杂且劳动密集的任务。大型语言模型(LLMs)可以用于通过自动生成苏格拉底式问题来辅助学生的人力努力。然而,现有的方法中,提示这些LLMs有时会产生无效输出,例如直接揭示问题解决方案或提供不相关或过早的问题。为了减轻这个问题,我们受到人工智能反馈强化学习(RLAIF)的启发,首先提出了一个数据增强方法,为现有的苏格拉底式问题数据集增加在特定方式上无效的问题。接下来,我们提出了一个使用直接偏好优化(DPO)方法优化开源LLM(如LLama 2)的方法,使其倾向于地面真实问题而不是生成无效的问题。我们对学生代码调试的苏格拉底式问题数据集的实验证明,DPO-优化的7B LLama 2模型可以有效地避免生成无效问题,从而超越了现有的提示方法。



3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot