Abstract
Recent advancements in large language models, including GPT-4 and its variants, and Generative AI-assisted coding tools like GitHub Copilot, ChatGPT, and Tabnine, have significantly transformed software development. This paper analyzes how these innovations impact productivity and software test development metrics. These tools enable developers to generate complete software programs with minimal human intervention before deployment. However, thorough review and testing by developers are still crucial. Utilizing the Test Pyramid concept, which categorizes tests into unit, integration, and end-to-end tests, we evaluate three popular AI coding assistants by generating and comparing unit tests for opensource modules. Our findings show that AI-generated tests are of equivalent quality to original tests, highlighting differences in usage and results among the tools. This research enhances the understanding and capabilities of AI-assistant tools in automated testing.
Abstract (translated)
近期,大型语言模型(包括GPT-4及其变体)和生成式AI辅助编码工具(如GitHub Copilot、ChatGPT和Tabnine)的发展显著改变了软件开发的方式。本文分析了这些创新如何影响生产力以及软件测试开发的指标。这些工具使开发者能够在部署之前以极小的人为干预生成完整的软件程序。然而,开发者的全面审查和测试仍然是至关重要的。通过利用将测试分为单元测试、集成测试和端到端测试的概念——即测试金字塔概念,我们评估了三种流行的AI编码助手,通过对开源模块生成并比较单元测试来完成这一评估。我们的研究结果表明,由AI生成的测试与原始测试的质量相当,并突出了这些工具在使用及结果上的差异。这项研究提高了对AI辅助工具在自动化测试中理解和能力的认知水平。
URL
https://arxiv.org/abs/2411.02328