Abstract
Public Code Review (PCR) is an assistant to the internal code review of the development team, in the form of a public Software Question Answering (SQA) community, to help developers access high-quality and efficient review services. Current methods on PCR mainly focus on the reviewer's perspective, including finding a capable reviewer, predicting comment quality, and recommending/generating review comments. However, it is not well studied that how to satisfy the review necessity requests posted by developers which can increase their visibility, which in turn acts as a prerequisite for better review responses. To this end, we propose a Knowledge-guided Prompt learning for Public Code Review (KP-PCR) to achieve developer-based code review request quality assurance (i.e., predicting request necessity and recommending tags subtask). Specifically, we reformulate the two subtasks via 1) text prompt tuning which converts both of them into a Masked Language Model (MLM) by constructing prompt templates using hard prompt; 2) knowledge and code prefix tuning which introduces external knowledge by soft prompt, and uses data flow diagrams to characterize code snippets. Finally, both of the request necessity prediction and tag recommendation subtasks output predicted results through an answer engineering module. In addition, we further analysis the time complexity of our KP-PCR that has lightweight prefix based the operation of introducing knowledge. Experimental results on the PCR dataset for the period 2011-2023 demonstrate that our KP-PCR outperforms baselines by 8.3%-28.8% in the request necessity prediction and by 0.1%-29.5% in the tag recommendation. The code implementation is released at this https URL.
Abstract (translated)
公共代码审查(Public Code Review,简称PCR)是开发团队内部代码审查的助手,以公开的软件问答(SQA)社区的形式存在,旨在帮助开发者获取高质量和高效的审查服务。目前关于PCR的方法主要集中在审阅者的视角上,包括寻找有能力的审阅者、预测评论质量以及推荐/生成审查评论。然而,如何满足开发人员发布的审查需求请求并提高其可见性,从而作为获得更好审查响应的前提条件,这一方面研究得并不充分。为此,我们提出了一种基于知识引导提示学习的公共代码审查(KP-PCR),以实现开发者基础的代码审查请求质量保证(即预测请求必要性和推荐标签子任务)。具体来说,我们将这两个子任务重新定义为:1)通过硬提示构造提示模板进行文本提示调优,将它们都转化为掩码语言模型(MLM);2)通过软提示引入外部知识,并使用数据流图来表征代码片段的前缀调优。最终,请求必要性预测和标签推荐子任务通过一个答案工程模块输出预测结果。此外,我们还进一步分析了KP-PCR的时间复杂度,基于轻量级前缀的操作来引入知识。在2011年至2023年的PCR数据集上的实验结果显示,我们的KP-PCR在请求必要性预测上比基线高出8.3%-28.8%,在标签推荐上高出0.1%-29.5%。代码实现发布在此链接:[https URL]。
URL
https://arxiv.org/abs/2410.21673