Abstract
While language models (LMs) have proven remarkably adept at generating code, many programs are challenging for LMs to generate using their parametric knowledge alone. Providing external contexts such as library documentation can facilitate generating accurate and functional code. Despite the success of retrieval-augmented generation (RAG) in various text-oriented tasks, its potential for improving code generation remains under-explored. In this work, we conduct a systematic, large-scale analysis by asking: in what scenarios can retrieval benefit code generation models? and what challenges remain? We first curate a comprehensive evaluation benchmark, CodeRAG-Bench, encompassing three categories of code generation tasks, including basic programming, open-domain, and repository-level problems. We aggregate documents from five sources for models to retrieve contexts: competition solutions, online tutorials, library documentation, StackOverflow posts, and GitHub repositories. We examine top-performing models on CodeRAG-Bench by providing contexts retrieved from one or multiple sources. While notable gains are made in final code generation by retrieving high-quality contexts across various settings, our analysis reveals room for improvement -- current retrievers still struggle to fetch useful contexts especially with limited lexical overlap, and generators fail to improve with limited context lengths or abilities to integrate additional contexts. We hope CodeRAG-Bench serves as an effective testbed to encourage further development of advanced code-oriented RAG methods.
Abstract (translated)
虽然语言模型(LMs)已经在生成代码方面表现出了惊人的能力,但许多程序单独使用参数知识很难让LMs生成。提供诸如库文档等外部上下文可以促进生成准确和功能良好的代码。尽管在各种文本相关任务中使用检索增强生成(RAG)取得了成功,但改善代码生成的潜力仍有待探讨。在这项工作中,我们通过询问:在什么情况下查询可以对代码生成模型有益?以及什么挑战仍然存在?我们首先整理了一个包含基本编程、开放领域和仓库级别问题的全面的评估基准,CodeRAG-Bench。我们从五个来源聚合了模型需要检索的上下文:竞赛解决方案、在线教程、库文档、StackOverflow帖子、GitHub存储库。我们通过从不同来源检索上下文来评估CodeRAG-Bench中的顶级模型。虽然通过检索高质量上下文在各种设置中取得显著的进步,但我们的分析发现还有改进的空间——目前的检索器在有限词义重叠的情况下仍然很难获取有用的上下文,而生成器在有限的可扩展上下文长度或集成附加上下文的能力方面也存在问题。我们希望CodeRAG-Bench成为进一步发展高级代码导向RAG方法的有效测试平台。
URL
https://arxiv.org/abs/2406.14497