Paper Reading AI Learner

HW-GPT-Bench: Hardware-Aware Architecture Benchmark for Language Models

2024-05-16 17:53:32
Rhea Sanjay Sukthanker, Arber Zela, Benedikt Staffler, Jorg K. H. Franke, Frank Hutter

Abstract

The expanding size of language models has created the necessity for a comprehensive examination across various dimensions that reflect the desiderata with respect to the tradeoffs between various hardware metrics, such as latency, energy consumption, GPU memory usage, and performance. There is a growing interest in establishing Pareto frontiers for different language model configurations to identify optimal models with specified hardware constraints. Notably, architectures that excel in latency on one device may not perform optimally on another. However, exhaustive training and evaluation of numerous architectures across diverse hardware configurations is computationally prohibitive. To this end, we propose HW-GPT-Bench, a hardware-aware language model surrogate benchmark, where we leverage weight-sharing techniques from Neural Architecture Search (NAS) to efficiently train a supernet proxy, encompassing language models of varying scales in a single model. We conduct profiling of these models across 13 devices, considering 5 hardware metrics and 3 distinct model scales. Finally, we showcase the usability of HW-GPT-Bench using 8 different multi-objective NAS algorithms and evaluate the quality of the resultant Pareto fronts. Through this benchmark, our objective is to propel and expedite research in the advancement of multi-objective methods for NAS and structural pruning in large language models.

Abstract (translated)

随着语言模型的大小不断扩展,在各种硬件指标之间进行全面的权衡已经变得必要。为了满足硬件约束,人们越来越关注为不同的语言模型配置建立Pareto前沿,以确定指定的硬件约束下的最优模型。值得注意的是,在单个设备上表现出卓越延迟的架构在其他设备上可能不会表现最优。然而,对多种硬件配置下的大量架构进行详尽训练和评估是计算上过于耗费资源的。为此,我们提出了HW-GPT-Bench,一个硬件感知的语言模型代理基准,利用来自神经架构搜索(NAS)的权重共享技术,以在单个模型中高效训练一个超级网络代理,涵盖不同规模的语言模型。我们在13个设备上对这些模型进行 profiling,考虑了5个硬件指标和3个不同的模型规模。最后,我们使用8种不同的多目标NAS算法展示了HW-GPT-Bench的可用性,并评估了由此产生的Pareto前沿的质量。通过这个基准,我们的目标是以推动和研究大型语言模型中多目标方法和结构修剪的进展为目的,加快研究步伐。

URL

https://arxiv.org/abs/2405.10299

PDF

https://arxiv.org/pdf/2405.10299.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot