Paper Reading AI Learner

Put Teacher in Student's Shoes: Cross-Distillation for Ultra-compact Model Compression Framework

2025-07-07 03:38:09
Maolin Wang, Jun Chu, Sicong Xie, Xiaoling Zang, Yao Zhao, Wenliang Zhong, Xiangyu Zhao

Abstract

In the era of mobile computing, deploying efficient Natural Language Processing (NLP) models in resource-restricted edge settings presents significant challenges, particularly in environments requiring strict privacy compliance, real-time responsiveness, and diverse multi-tasking capabilities. These challenges create a fundamental need for ultra-compact models that maintain strong performance across various NLP tasks while adhering to stringent memory constraints. To this end, we introduce Edge ultra-lIte BERT framework (EI-BERT) with a novel cross-distillation method. EI-BERT efficiently compresses models through a comprehensive pipeline including hard token pruning, cross-distillation and parameter quantization. Specifically, the cross-distillation method uniquely positions the teacher model to understand the student model's perspective, ensuring efficient knowledge transfer through parameter integration and the mutual interplay between models. Through extensive experiments, we achieve a remarkably compact BERT-based model of only 1.91 MB - the smallest to date for Natural Language Understanding (NLU) tasks. This ultra-compact model has been successfully deployed across multiple scenarios within the Alipay ecosystem, demonstrating significant improvements in real-world applications. For example, it has been integrated into Alipay's live Edge Recommendation system since January 2024, currently serving the app's recommendation traffic across \textbf{8.4 million daily active devices}.

Abstract (translated)

在移动计算时代,将高效的自然语言处理(NLP)模型部署到资源受限的边缘环境中面临着重大挑战,尤其是在需要严格隐私合规、实时响应和多样化多任务处理能力的情况下。这些挑战迫切地要求开发出超紧凑型模型,在满足严格的内存限制的同时,仍能保持强大的跨各种NLP任务性能。为此,我们引入了Edge ultra-lIte BERT框架(EI-BERT)及其创新的交叉蒸馏方法。EI-BERT通过包括硬令牌剪枝、交叉蒸馏和参数量化在内的全面压缩管道高效地缩小模型规模。特别是,交叉蒸馏方法使教师模型能够从学生模型的角度理解问题,并确保通过参数整合与模型间的相互作用进行高效的知识传递。通过广泛的实验,我们成功构建了一个仅1.91 MB的基于BERT的小型模型——迄今为止用于自然语言理解和(NLU)任务中最小的模型之一。这种超紧凑型模型已经在支付宝生态系统内的多个场景下成功部署,并在实际应用中显示出显著改进。例如,该模型自2024年1月以来已集成到支付宝的实时边缘推荐系统中,当前为应用程序的推荐流量服务着**840万台每日活跃设备**。

URL

https://arxiv.org/abs/2507.04636

PDF

https://arxiv.org/pdf/2507.04636.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot