Abstract
[...] Since then, various APR approaches, especially those leveraging the power of large language models (LLMs), have been rapidly developed to fix general software bugs. Unfortunately, the effectiveness of these advanced techniques in the context of regression bugs remains largely unexplored. This gap motivates the need for an empirical study evaluating the effectiveness of modern APR techniques in fixing real-world regression bugs. In this work, we conduct an empirical study of APR techniques on Java regression bugs. To facilitate our study, we introduce RegMiner4APR, a high-quality benchmark of Java regression bugs integrated into a framework designed to facilitate APR research. The current benchmark includes 99 regression bugs collected from 32 widely used real-world Java GitHub repositories. We begin by conducting an in-depth analysis of the benchmark, demonstrating its diversity and quality. Building on this foundation, we empirically evaluate the capabilities of APR to regression bugs by assessing both traditional APR tools and advanced LLM-based APR approaches. Our experimental results show that classical APR tools fail to repair any bugs, while LLM-based APR approaches exhibit promising potential. Motivated by these results, we investigate impact of incorporating bug-inducing change information into LLM-based APR approaches for fixing regression bugs. Our results highlight that this context-aware enhancement significantly improves the performance of LLM-based APR, yielding 1.8x more successful repairs compared to using LLM-based APR without such context.
Abstract (translated)
从那时起,各种自动程序修复(APR)方法,特别是那些利用大型语言模型(LLM)能力的方法,迅速发展起来以解决通用软件缺陷。不幸的是,在回归错误背景下这些高级技术的有效性尚未得到充分探索。这种研究空白促使了对现代APR技术在处理真实世界回归错误有效性进行实证研究的需求。在这项工作中,我们针对Java的回归错误进行了APR技术的实证研究,并为此引入了一个高质量基准——RegMiner4APR,这是一个整合进旨在促进APR研究框架中的Java回归错误集合。 当前的基准包括从32个广泛使用的现实世界Java GitHub存储库中收集到的99个回归错误。我们首先对这个基准进行了深入分析,展示了其多样性和质量。在此基础上,我们通过评估传统APR工具和先进的LLM基础APR方法的能力,实证地检验了APR对于解决回归错误的有效性。我们的实验结果显示,传统的APR工具未能修复任何错误,而基于LLM的APR方法则展现出了巨大的潜力。 受到这些结果的激励,我们进一步研究了将引起错误的变化信息整合到基于LLM的APR方法中以修复回归错误的影响。我们的发现表明,这种上下文感知改进显著提升了基于LLM的APR性能,在使用该上下文信息时成功修复次数是不使用该信息情况下的1.8倍。
URL
https://arxiv.org/abs/2506.13182