Insights from Verification: Training a Verilog Generation LLM with Reinforcement Learning with Testbench Feedback

Abstract
Abstract (translated)
URL
PDF

Abstract

Large language models (LLMs) have shown strong performance in Verilog generation from natural language description. However, ensuring the functional correctness of the generated code remains a significant challenge. This paper introduces a method that integrates verification insights from testbench into the training of Verilog generation LLMs, aligning the training with the fundamental goal of hardware design: functional correctness. The main obstacle in using LLMs for Verilog code generation is the lack of sufficient functional verification data, particularly testbenches paired with design specifications and code. To address this problem, we introduce an automatic testbench generation pipeline that decomposes the process and uses feedback from the Verilog compiler simulator (VCS) to reduce hallucination and ensure correctness. We then use the testbench to evaluate the generated codes and collect them for further training, where verification insights are introduced. Our method applies reinforcement learning (RL), specifically direct preference optimization (DPO), to align Verilog code generation with functional correctness by training preference pairs based on testbench outcomes. In evaluations on VerilogEval-Machine, VerilogEval-Human, RTLLM v1.1, RTLLM v2, and VerilogEval v2, our approach consistently outperforms state-of-the-art baselines in generating functionally correct Verilog code. We open source all training code, data, and models at this https URL.

Abstract (translated)

大型语言模型（LLMs）在从自然语言描述生成Verilog代码方面表现出色，但确保生成代码的功能正确性仍然是一个重大挑战。本文介绍了一种方法，该方法将测试平台中的验证洞察集成到Verilog代码生成的LLM训练中，使其与硬件设计的根本目标——功能正确性——保持一致。使用LLMs进行Verilog代码生成的主要障碍是缺乏足够的功能验证数据，特别是带有设计规范和代码的测试平台。为了解决这个问题，我们引入了一种自动测试平台生成管道，该管道分解了过程，并利用Verilog编译器仿真器（VCS）的反馈来减少幻觉并确保正确性。然后，我们将测试平台用于评估生成的代码，并收集这些代码以进行进一步训练，在此过程中引入验证洞察。我们的方法采用强化学习（RL），特别是直接偏好优化（DPO），通过基于测试平台结果训练偏好对，使Verilog代码生成与功能正确性保持一致。在VerilogEval-Machine、VerilogEval-Human、RTLLM v1.1、RTLLM v2和VerilogEval v2上的评估中，我们的方法在生成功能正确的Verilog代码方面始终优于最先进的基线模型。我们在[此链接](https://this.http.url)开源了所有训练代码、数据和模型。

URL

https://arxiv.org/abs/2504.15804

PDF

https://arxiv.org/pdf/2504.15804.pdf

Insights from Verification: Training a Verilog Generation LLM with Reinforcement Learning with Testbench Feedback

Abstract

Abstract (translated)

URL

PDF Copy

PDF