Abstract
Training modern deep learning models is increasingly constrained by GPU memory and compute limits. While Randomized Numerical Linear Algebra (RandNLA) offers proven techniques to compress these models, the lack of a unified, production-grade library prevents widely adopting these methods. We present Panther, a PyTorch-compatible library that consolidates established RandNLA algorithms into a single high-performance framework. Panther engineers efficient, drop-in replacements for standard components including sketched linear layers, 2D convolution, multi-head attention, and randomized matrix decompositions (such as pivoted CholeskyQR). By implementing a custom C++/CUDA backend (pawX), Panther provides an optimized implementation that can run on both CPUs and GPUs. We demonstrate the effectiveness of RandNLA techniques and Panther's ease of adoption. By replacing standard PyTorch linear layers with Panther layers (requiring only a few lines of code) we achieve significant memory savings (up to 75%) on BERT while maintaining comparable loss. Source code is available (MIT License) at this https URL, along with demonstration video at this https URL.
Abstract (translated)
训练现代深度学习模型越来越受到GPU内存和计算限制的制约。虽然随机数值线性代数(RandNLA)提供了一种压缩这些模型的有效技术,但缺乏一个统一且适合生产的库阻碍了这些方法的广泛采用。我们介绍了Panther,这是一个与PyTorch兼容的库,它将已确立的RandNLA算法整合到一个高性能框架中。Panther为标准组件(包括草图线性层、2D卷积、多头注意机制以及随机矩阵分解(如带有枢轴的CholeskyQR)等)提供了高效且可直接替换的解决方案。 通过实现自定义的C++/CUDA后端(pawX),Panther提供了一个优化的版本,可以在CPU和GPU上运行。我们展示了RandNLA技术的有效性以及Panther易于采用的特点。通过用Panther层替代标准的PyTorch线性层(仅需几行代码), 我们在BERT模型上实现了显著的记忆节省(高达75%),同时保持了类似的损失函数表现。 源代码可在[MIT许可证](https://www.mit.edu/~parrt/licenses/license.html)下提供,链接为 [此 URL](https://example.com/panther_repo),并提供了演示视频的链接:[此 URL](https://example.com/demo_video)。
URL
https://arxiv.org/abs/2601.15473