Paper Reading AI Learner

The Few-shot Dilemma: Over-prompting Large Language Models

2025-09-16 16:00:06
Yongjian Tang, Doruk Tuncel, Christian Koerner, Thomas Runkler

Abstract

Over-prompting, a phenomenon where excessive examples in prompts lead to diminished performance in Large Language Models (LLMs), challenges the conventional wisdom about in-context few-shot learning. To investigate this few-shot dilemma, we outline a prompting framework that leverages three standard few-shot selection methods - random sampling, semantic embedding, and TF-IDF vectors - and evaluate these methods across multiple LLMs, including GPT-4o, GPT-3.5-turbo, DeepSeek-V3, Gemma-3, LLaMA-3.1, LLaMA-3.2, and Mistral. Our experimental results reveal that incorporating excessive domain-specific examples into prompts can paradoxically degrade performance in certain LLMs, which contradicts the prior empirical conclusion that more relevant few-shot examples universally benefit LLMs. Given the trend of LLM-assisted software engineering and requirement analysis, we experiment with two real-world software requirement classification datasets. By gradually increasing the number of TF-IDF-selected and stratified few-shot examples, we identify their optimal quantity for each LLM. This combined approach achieves superior performance with fewer examples, avoiding the over-prompting problem, thus surpassing the state-of-the-art by 1% in classifying functional and non-functional requirements.

Abstract (translated)

过度提示(Over-Prompting)是一种现象,即在大型语言模型(LLMs)的提示中使用过多的例子会导致性能下降。这一现象挑战了关于上下文中的少量学习的传统观念。为了探究这种少量学习困境,我们提出了一种基于三种标准方法——随机采样、语义嵌入和TF-IDF向量——构建的提示框架,并评估这些方法在包括GPT-4o、GPT-3.5-turbo、DeepSeek-V3、Gemma-3、LLaMA-3.1、LLaMA-3.2以及Mistral在内的多个大型语言模型上的效果。实验结果显示,在某些大型语言模型中,将过多的领域特定例子融入提示中会出人意料地导致性能下降,这与先前的经验结论相悖:即更多的相关少量学习示例对所有大型语言模型均有益。 鉴于大型语言模型辅助软件工程和需求分析的趋势,我们使用两个真实世界的软件需求分类数据集进行了实验。通过逐渐增加TF-IDF选择的、分层抽样的少量学习示例的数量,我们确定了每个大型语言模型的最佳示例数量。这种方法结合使用较少的例子实现了更优的表现,并避免了过度提示的问题,从而在功能性与非功能性需求的分类上超越了现有最佳水平1%。

URL

https://arxiv.org/abs/2509.13196

PDF

https://arxiv.org/pdf/2509.13196.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot