Paper Reading AI Learner

Don't Run with Scissors: Pruning Breaks VLA Models but They Can Be Recovered

2025-10-09 17:07:30
Jason Jabbour, Dong-Ki Kim, Max Smith, Jay Patrikar, Radhika Ghosal, Youhui Wang, Ali Agha, Vijay Janapa Reddi, Shayegan Omidshafiei

Abstract

Vision-Language-Action (VLA) models have advanced robotic capabilities but remain challenging to deploy on resource-limited hardware. Pruning has enabled efficient compression of large language models (LLMs), yet it is largely understudied in robotics. Surprisingly, we observe that pruning VLA models leads to drastic degradation and increased safety violations. We introduce GLUESTICK, a post-pruning recovery method that restores much of the original model's functionality while retaining sparsity benefits. Our method performs a one-time interpolation between the dense and pruned models in weight-space to compute a corrective term. This correction is used during inference by each pruned layer to recover lost capabilities with minimal overhead. GLUESTICK requires no additional training, is agnostic to the pruning algorithm, and introduces a single hyperparameter that controls the tradeoff between efficiency and accuracy. Across diverse VLA architectures and tasks in manipulation and navigation, GLUESTICK achieves competitive memory efficiency while substantially recovering success rates and reducing safety violations. Additional material can be found at: this https URL.

Abstract (translated)

翻译如下: Vision-Language-Action (VLA) 模型虽然在增强机器人能力方面取得了进展,但在资源有限的硬件上部署仍然具有挑战性。修剪技术已经成功地使大型语言模型(LLMs)实现了高效的压缩,然而它在机器人领域研究较少。令人惊讶的是,我们观察到对VLA模型进行修剪会导致性能大幅下降,并增加安全违规的风险。为此,我们提出了一种名为GLUESTICK的后修剪恢复方法,该方法可以在保留稀疏性优势的同时恢复原始模型的大部分功能。我们的方法通过在一维空间内计算密集型和已修剪模型之间的插值来生成修正项。在推理过程中,每个被剪枝层利用这个校正项以极小的额外开销恢复丢失的能力。GLUESTICK不需要额外训练,并且对任何修剪算法都保持中立状态;它引入了一个单一超参数来控制效率与精度之间的权衡。 在各种VLA架构和操作及导航任务上,GLUESTICK实现了具有竞争力的记忆效率的同时大幅恢复成功率并减少安全违规行为。更多相关信息可以在以下链接找到:[此URL](this https URL)。

URL

https://arxiv.org/abs/2510.08464

PDF

https://arxiv.org/pdf/2510.08464.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot