Paper Reading AI Learner

Insights from Verification: Training a Verilog Generation LLM with Reinforcement Learning with Testbench Feedback

2025-04-22 11:38:14
Ning Wang, Bingkun Yao, Jie Zhou, Yuchen Hu, Xi Wang, Nan Guan, Zhe Jiang

Abstract

Large language models (LLMs) have shown strong performance in Verilog generation from natural language description. However, ensuring the functional correctness of the generated code remains a significant challenge. This paper introduces a method that integrates verification insights from testbench into the training of Verilog generation LLMs, aligning the training with the fundamental goal of hardware design: functional correctness. The main obstacle in using LLMs for Verilog code generation is the lack of sufficient functional verification data, particularly testbenches paired with design specifications and code. To address this problem, we introduce an automatic testbench generation pipeline that decomposes the process and uses feedback from the Verilog compiler simulator (VCS) to reduce hallucination and ensure correctness. We then use the testbench to evaluate the generated codes and collect them for further training, where verification insights are introduced. Our method applies reinforcement learning (RL), specifically direct preference optimization (DPO), to align Verilog code generation with functional correctness by training preference pairs based on testbench outcomes. In evaluations on VerilogEval-Machine, VerilogEval-Human, RTLLM v1.1, RTLLM v2, and VerilogEval v2, our approach consistently outperforms state-of-the-art baselines in generating functionally correct Verilog code. We open source all training code, data, and models at this https URL.

Abstract (translated)

大型语言模型(LLMs)在从自然语言描述生成Verilog代码方面表现出色,但确保生成代码的功能正确性仍然是一个重大挑战。本文介绍了一种方法,该方法将测试平台中的验证洞察集成到Verilog代码生成的LLM训练中,使其与硬件设计的根本目标——功能正确性——保持一致。 使用LLMs进行Verilog代码生成的主要障碍是缺乏足够的功能验证数据,特别是带有设计规范和代码的测试平台。为了解决这个问题,我们引入了一种自动测试平台生成管道,该管道分解了过程,并利用Verilog编译器仿真器(VCS)的反馈来减少幻觉并确保正确性。然后,我们将测试平台用于评估生成的代码,并收集这些代码以进行进一步训练,在此过程中引入验证洞察。 我们的方法采用强化学习(RL),特别是直接偏好优化(DPO),通过基于测试平台结果训练偏好对,使Verilog代码生成与功能正确性保持一致。 在VerilogEval-Machine、VerilogEval-Human、RTLLM v1.1、RTLLM v2和VerilogEval v2上的评估中,我们的方法在生成功能正确的Verilog代码方面始终优于最先进的基线模型。我们在[此链接](https://this.http.url)开源了所有训练代码、数据和模型。

URL

https://arxiv.org/abs/2504.15804

PDF

https://arxiv.org/pdf/2504.15804.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot