StablePT: Towards Stable Prompting for Few-shot Learning via Input Separation

2024-04-30 08:01:49

Xiaoming Liu, Chen Liu, Zhaohan Zhang, Chengzhengxu Li, Longtian Wang, Yu Lan, Chao Shen

arXiv_CL

Abstract
Abstract (translated)
URL
PDF

Abstract

Large language models have shown their ability to become effective few-shot learners with prompting, revoluting the paradigm of learning with data scarcity. However, this approach largely depends on the quality of prompt initialization, and always exhibits large variability among different runs. Such property makes prompt tuning highly unreliable and vulnerable to poorly constructed prompts, which limits its extension to more real-world applications. To tackle this issue, we propose to treat the hard prompt and soft prompt as separate inputs to mitigate noise brought by the prompt initialization. Furthermore, we optimize soft prompts with contrastive learning for utilizing class-aware information in the training process to maintain model performance. Experimental results demonstrate that \sysname outperforms state-of-the-art methods by 7.20% in accuracy and reduces the standard deviation by 2.02 on average. Furthermore, extensive experiments underscore its robustness and stability across 7 datasets covering various tasks.

Abstract (translated)

大语言模型通过提示展示了在数据稀缺的情况下成为有效的少样本学习者，颠覆了学习范式。然而，这种方法在不同的运行中很大程度上取决于提示的初始质量，并且表现出很大的变异性。这种特性使得提示调整变得高度不可靠，容易受到构建不良提示的损害，从而限制了其在更广泛的现实应用中的扩展。为了解决这个问题，我们提出将难提示和软提示视为独立的输入以减轻提示初始化带来的噪声。此外，我们通过对比学习优化软提示，以在训练过程中利用类感知信息来维持模型性能。实验结果表明，\sysname在准确度上比最先进的 methods提高了7.20%，平均减少了2.02的标准差。此外，广泛的实验证实了其在各种任务上的稳健性和稳定性。

URL

https://arxiv.org/abs/2404.19335

PDF

https://arxiv.org/pdf/2404.19335.pdf

StablePT: Towards Stable Prompting for Few-shot Learning via Input Separation

Abstract

Abstract (translated)

URL

PDF Copy

PDF