Paper Reading AI Learner

CASS: Nvidia to AMD Transpilation with Data, Models, and Benchmark

2025-05-22 17:48:53
Ahmed Heakl, Sarim Hashmi, Gustavo Bertolo Stahl, Seung Hun Eddie Han, Salman Khan, Abdulrahman Mahmoud

Abstract

We introduce \texttt{CASS}, the first large-scale dataset and model suite for cross-architecture GPU code transpilation, targeting both source-level (CUDA~$\leftrightarrow$~HIP) and assembly-level (Nvidia SASS~$\leftrightarrow$~AMD RDNA3) translation. The dataset comprises 70k verified code pairs across host and device, addressing a critical gap in low-level GPU code portability. Leveraging this resource, we train the \texttt{CASS} family of domain-specific language models, achieving 95\% source translation accuracy and 37.5\% assembly translation accuracy, substantially outperforming commercial baselines such as GPT-4o, Claude, and Hipify. Our generated code matches native performance in over 85\% of test cases, preserving runtime and memory behavior. To support rigorous evaluation, we introduce \texttt{CASS-Bench}, a curated benchmark spanning 16 GPU domains with ground-truth execution. All data, models, and evaluation tools are released as open source to foster progress in GPU compiler tooling, binary compatibility, and LLM-guided hardware translation. Dataset and benchmark are on \href{this https URL}{\textcolor{blue}{HuggingFace}}, with code at \href{this https URL}{\textcolor{blue}{GitHub}}.

Abstract (translated)

我们介绍了一种名为\texttt{CASS}的大型数据集和模型套件,用于跨架构GPU代码转译,旨在支持源码级(CUDA~$\leftrightarrow$~HIP)和汇编级(Nvidia SASS~$\leftrightarrow$~AMD RDNA3)翻译。该数据集包含了70,000个经过验证的代码对,涵盖了主机端和设备端,填补了低级别GPU代码可移植性中的关键空白。利用这一资源,我们训练了\texttt{CASS}系列特定领域的语言模型,在源码转译准确率上达到了95%,汇编级转译准确率达到37.5%。这些性能远超商业基准如GPT-4o、Claude和Hipify的水平。在超过85%的测试案例中,我们生成的代码能够匹配原生性能,并保持了运行时间和内存行为的一致性。 为了支持严格的评估,我们引入了\texttt{CASS-Bench},这是一个经过精心挑选的基准集,涵盖了16个GPU领域并且拥有真实的执行结果。所有的数据、模型和评估工具都作为开源项目发布,以促进在GPU编译器工具开发、二进制兼容性以及LLM(大型语言模型)指导硬件翻译方面的进步。 该数据集与基准可以在\href{this https URL}{HuggingFace}上找到,代码托管于\href{this https URL}{GitHub}。

URL

https://arxiv.org/abs/2505.16968

PDF

https://arxiv.org/pdf/2505.16968.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot