MULTIFLOW: Shifting Towards Task-Agnostic Vision-Language Pruning

Abstract
Abstract (translated)
URL
PDF

Abstract

While excellent in transfer learning, Vision-Language models (VLMs) come with high computational costs due to their large number of parameters. To address this issue, removing parameters via model pruning is a viable solution. However, existing techniques for VLMs are task-specific, and thus require pruning the network from scratch for each new task of interest. In this work, we explore a new direction: Task-Agnostic Vision-Language Pruning (TA-VLP). Given a pretrained VLM, the goal is to find a unique pruned counterpart transferable to multiple unknown downstream tasks. In this challenging setting, the transferable representations already encoded in the pretrained model are a key aspect to preserve. Thus, we propose Multimodal Flow Pruning (MULTIFLOW), a first, gradient-free, pruning framework for TA-VLP where: (i) the importance of a parameter is expressed in terms of its magnitude and its information flow, by incorporating the saliency of the neurons it connects; and (ii) pruning is driven by the emergent (multimodal) distribution of the VLM parameters after pretraining. We benchmark eight state-of-the-art pruning algorithms in the context of TA-VLP, experimenting with two VLMs, three vision-language tasks, and three pruning ratios. Our experimental results show that MULTIFLOW outperforms recent sophisticated, combinatorial competitors in the vast majority of the cases, paving the way towards addressing TA-VLP. The code is publicly available at this https URL.

Abstract (translated)

虽然Vision-Language模型（VLMs）在迁移学习方面表现出色，但它们由于参数数量众多，导致计算成本较高。为解决这一问题，通过模型剪裁删除参数是一种可行的解决方案。然而，现有的VLM技术针对具体任务，因此需要从零开始网络剪裁以处理感兴趣的新任务。在这项工作中，我们探讨了一个新的方向：任务无关的Vision-Language剪裁（TA-VLP）。给定预训练的VLM，目标是从多个未知的下游任务中找到一个独特的修剪后可传输的对照转移表示。在具有挑战性的设置中，预训练模型中已经编码的转移表示是关键，以保留这种转移表示。因此，我们提出了Multimodal Flow Pruning（MULTIFLOW）作为TA-VLP第一个无梯度免费的剪裁框架：（i）参数的重要性用其大小和信息流表示，通过融入其连接的神经元的置信度来表示；（ii）剪裁是由预训练VLM参数的涌现（多模态）分布驱动的。我们在TA-VLP的背景下对八个最先进的剪裁算法进行了基准测试，尝试了两种VLM和三个视觉语言任务，并研究了三个剪裁比。我们的实验结果表明，MULTIFLOW在大多数情况下都超过了最近最先进的组合剪裁竞争对手，为解决TA-VLP铺平道路。代码公开在https://这个URL上。

URL

https://arxiv.org/abs/2404.05621

PDF

https://arxiv.org/pdf/2404.05621.pdf

MULTIFLOW: Shifting Towards Task-Agnostic Vision-Language Pruning

Abstract

Abstract (translated)

URL

PDF Copy

PDF