Paper Reading AI Learner

MULTIFLOW: Shifting Towards Task-Agnostic Vision-Language Pruning

2024-04-08 15:51:21
Matteo Farina, Massimiliano Mancini, Elia Cunegatti, Gaowen Liu, Giovanni Iacca, Elisa Ricci


While excellent in transfer learning, Vision-Language models (VLMs) come with high computational costs due to their large number of parameters. To address this issue, removing parameters via model pruning is a viable solution. However, existing techniques for VLMs are task-specific, and thus require pruning the network from scratch for each new task of interest. In this work, we explore a new direction: Task-Agnostic Vision-Language Pruning (TA-VLP). Given a pretrained VLM, the goal is to find a unique pruned counterpart transferable to multiple unknown downstream tasks. In this challenging setting, the transferable representations already encoded in the pretrained model are a key aspect to preserve. Thus, we propose Multimodal Flow Pruning (MULTIFLOW), a first, gradient-free, pruning framework for TA-VLP where: (i) the importance of a parameter is expressed in terms of its magnitude and its information flow, by incorporating the saliency of the neurons it connects; and (ii) pruning is driven by the emergent (multimodal) distribution of the VLM parameters after pretraining. We benchmark eight state-of-the-art pruning algorithms in the context of TA-VLP, experimenting with two VLMs, three vision-language tasks, and three pruning ratios. Our experimental results show that MULTIFLOW outperforms recent sophisticated, combinatorial competitors in the vast majority of the cases, paving the way towards addressing TA-VLP. The code is publicly available at this https URL.

Abstract (translated)

虽然Vision-Language模型(VLMs)在迁移学习方面表现出色,但它们由于参数数量众多,导致计算成本较高。为解决这一问题,通过模型剪裁删除参数是一种可行的解决方案。然而,现有的VLM技术针对具体任务,因此需要从零开始网络剪裁以处理感兴趣的新任务。在这项工作中,我们探讨了一个新的方向:任务无关的Vision-Language剪裁(TA-VLP)。给定预训练的VLM,目标是从多个未知的下游任务中找到一个独特的修剪后可传输的对照转移表示。在具有挑战性的设置中,预训练模型中已经编码的转移表示是关键,以保留这种转移表示。因此,我们提出了Multimodal Flow Pruning(MULTIFLOW)作为TA-VLP第一个无梯度免费的剪裁框架: (i)参数的重要性用其大小和信息流表示,通过融入其连接的神经元的置信度来表示; (ii)剪裁是由预训练VLM参数的涌现(多模态)分布驱动的。 我们在TA-VLP的背景下对八个最先进的剪裁算法进行了基准测试,尝试了两种VLM和三个视觉语言任务,并研究了三个剪裁比。我们的实验结果表明,MULTIFLOW在大多数情况下都超过了最近最先进的组合剪裁竞争对手,为解决TA-VLP铺平道路。代码公开在https://这个URL上。



3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot