Abstract
Few-shot learning allows pre-trained language models to adapt to downstream tasks while using a limited number of training examples. However, practical applications are limited when all model parameters must be optimized. In this work we apply a new technique for parameter efficient few shot learning while adopting a strict definition of parameter efficiency. Our training method combines 1) intermediate training by reformulating natural language tasks as entailment tasks \cite{wang_entailment_2021} and 2) differentiable optimization of template and label tokens \cite{zhang_differentiable_2021}. We quantify the tradeoff between parameter efficiency and performance in the few-shot regime and propose a simple model agnostic approach that can be extended to any task By achieving competitive performance while only optimizing 3\% of a model's parameters and allowing for batched inference, we allow for more efficient practical deployment of models.
Abstract (translated)
少量学习允许预先训练的语言模型使用有限的训练示例适应后续任务。然而,当所有模型参数必须优化时,实际应用是有限的。在本工作中,我们应用了一种参数高效的少量学习技术,同时采用严格的参数效率定义。我们的训练方法包括将自然语言任务改写为暗示任务( cite{wang_entailment_2021} )和将模板和标签 token 的可区分优化( cite{zhang_differentiable_2021} )相结合。我们量化了参数效率和性能之间的权衡,并提出了一种模型无关的方法,可以扩展到任何任务。通过仅优化模型参数的3%,同时允许批量推理,我们允许更有效地在实践中部署模型。
URL
https://arxiv.org/abs/2301.13345