Paper Reading AI Learner

From Empirical Evaluation to Context-Aware Enhancement: Repairing Regression Errors with LLMs

2025-06-16 07:49:18
Anh Ho, Thanh Le-Cong, Bach Le, Christine Rizkallah

Abstract

[...] Since then, various APR approaches, especially those leveraging the power of large language models (LLMs), have been rapidly developed to fix general software bugs. Unfortunately, the effectiveness of these advanced techniques in the context of regression bugs remains largely unexplored. This gap motivates the need for an empirical study evaluating the effectiveness of modern APR techniques in fixing real-world regression bugs. In this work, we conduct an empirical study of APR techniques on Java regression bugs. To facilitate our study, we introduce RegMiner4APR, a high-quality benchmark of Java regression bugs integrated into a framework designed to facilitate APR research. The current benchmark includes 99 regression bugs collected from 32 widely used real-world Java GitHub repositories. We begin by conducting an in-depth analysis of the benchmark, demonstrating its diversity and quality. Building on this foundation, we empirically evaluate the capabilities of APR to regression bugs by assessing both traditional APR tools and advanced LLM-based APR approaches. Our experimental results show that classical APR tools fail to repair any bugs, while LLM-based APR approaches exhibit promising potential. Motivated by these results, we investigate impact of incorporating bug-inducing change information into LLM-based APR approaches for fixing regression bugs. Our results highlight that this context-aware enhancement significantly improves the performance of LLM-based APR, yielding 1.8x more successful repairs compared to using LLM-based APR without such context.

Abstract (translated)

从那时起,各种自动程序修复(APR)方法,特别是那些利用大型语言模型(LLM)能力的方法,迅速发展起来以解决通用软件缺陷。不幸的是,在回归错误背景下这些高级技术的有效性尚未得到充分探索。这种研究空白促使了对现代APR技术在处理真实世界回归错误有效性进行实证研究的需求。在这项工作中,我们针对Java的回归错误进行了APR技术的实证研究,并为此引入了一个高质量基准——RegMiner4APR,这是一个整合进旨在促进APR研究框架中的Java回归错误集合。 当前的基准包括从32个广泛使用的现实世界Java GitHub存储库中收集到的99个回归错误。我们首先对这个基准进行了深入分析,展示了其多样性和质量。在此基础上,我们通过评估传统APR工具和先进的LLM基础APR方法的能力,实证地检验了APR对于解决回归错误的有效性。我们的实验结果显示,传统的APR工具未能修复任何错误,而基于LLM的APR方法则展现出了巨大的潜力。 受到这些结果的激励,我们进一步研究了将引起错误的变化信息整合到基于LLM的APR方法中以修复回归错误的影响。我们的发现表明,这种上下文感知改进显著提升了基于LLM的APR性能,在使用该上下文信息时成功修复次数是不使用该信息情况下的1.8倍。

URL

https://arxiv.org/abs/2506.13182

PDF

https://arxiv.org/pdf/2506.13182.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot