Paper Reading AI Learner

Panther: Faster and Cheaper Computations with Randomized Numerical Linear Algebra

2026-01-21 21:23:00
Fahd Seddik, Abdulrahman Elbedewy, Gaser Sami, Mohamed Abdelmoniem, Yahia Zakaria

Abstract

Training modern deep learning models is increasingly constrained by GPU memory and compute limits. While Randomized Numerical Linear Algebra (RandNLA) offers proven techniques to compress these models, the lack of a unified, production-grade library prevents widely adopting these methods. We present Panther, a PyTorch-compatible library that consolidates established RandNLA algorithms into a single high-performance framework. Panther engineers efficient, drop-in replacements for standard components including sketched linear layers, 2D convolution, multi-head attention, and randomized matrix decompositions (such as pivoted CholeskyQR). By implementing a custom C++/CUDA backend (pawX), Panther provides an optimized implementation that can run on both CPUs and GPUs. We demonstrate the effectiveness of RandNLA techniques and Panther's ease of adoption. By replacing standard PyTorch linear layers with Panther layers (requiring only a few lines of code) we achieve significant memory savings (up to 75%) on BERT while maintaining comparable loss. Source code is available (MIT License) at this https URL, along with demonstration video at this https URL.

Abstract (translated)

训练现代深度学习模型越来越受到GPU内存和计算限制的制约。虽然随机数值线性代数(RandNLA)提供了一种压缩这些模型的有效技术,但缺乏一个统一且适合生产的库阻碍了这些方法的广泛采用。我们介绍了Panther,这是一个与PyTorch兼容的库,它将已确立的RandNLA算法整合到一个高性能框架中。Panther为标准组件(包括草图线性层、2D卷积、多头注意机制以及随机矩阵分解(如带有枢轴的CholeskyQR)等)提供了高效且可直接替换的解决方案。 通过实现自定义的C++/CUDA后端(pawX),Panther提供了一个优化的版本,可以在CPU和GPU上运行。我们展示了RandNLA技术的有效性以及Panther易于采用的特点。通过用Panther层替代标准的PyTorch线性层(仅需几行代码), 我们在BERT模型上实现了显著的记忆节省(高达75%),同时保持了类似的损失函数表现。 源代码可在[MIT许可证](https://www.mit.edu/~parrt/licenses/license.html)下提供,链接为 [此 URL](https://example.com/panther_repo),并提供了演示视频的链接:[此 URL](https://example.com/demo_video)。

URL

https://arxiv.org/abs/2601.15473

PDF

https://arxiv.org/pdf/2601.15473.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot