Abstract
Zero-shot prompting techniques have significantly improved the performance of Large Language Models (LLMs). However, we lack a clear understanding of why zero-shot prompts are so effective. For example, in the prompt "Let's think step-by-step," is "think" or "step-by-step" more crucial to its success? Existing interpretability methods, such as gradient-based and attention-based approaches, are computationally intensive and restricted to open-source models. We introduce the ZIP score (Zero-shot Importance of Perturbation score), a versatile metric applicable to both open and closed-source models, based on systematic input word perturbations. Our experiments across four recent LLMs, seven widely-used prompts, and several tasks, reveal interesting patterns in word importance. For instance, while both 'step-by-step' and 'think' show high ZIP scores, which one is more influential depends on the model and task. We validate our method using controlled experiments and compare our results with human judgments, finding that proprietary models align more closely with human intuition regarding word significance. These findings enhance our understanding of LLM behavior and contribute to developing more effective zero-shot prompts and improved model analysis.
Abstract (translated)
零样本提示技术显著提升了大型语言模型(LLM)的性能。然而,我们缺乏对其有效性的清晰理解。例如,在提示“Let's think step-by-step”中,“think”或“step-by-step”哪一个更为关键?现有的解释方法,如基于梯度和注意力的方法,在计算上较为耗时,并且仅限于开源模型使用。为此,我们引入了ZIP评分(零样本扰动重要性分数),这是一种适用于开放源代码及封闭源代码模型的灵活衡量标准,其基于系统化的输入词干扰来评估。 我们的实验覆盖了四个近期大型语言模型、七个广泛使用的提示以及多个任务,在这些实验中揭示了一些关于单词重要性的有趣模式。例如,尽管“step-by-step”和“think”都显示出了高ZIP评分,但哪个更具有影响力则取决于具体的模型和任务情况。我们通过受控实验验证了我们的方法,并将结果与人类判断进行了比较,发现专有模型在衡量词的重要性时更加接近于人类的直觉。 这些发现增强了我们对LLM行为的理解,并有助于开发更具效果的零样本提示以及改进模型分析技术。
URL
https://arxiv.org/abs/2502.03418