Abstract
The study presents the outcomes of research and experimental validation in the domain of automated codebase migration, with a focus on addressing challenges in transitioning SQL-based systems. The proposed method for migration essentially appears as a framework that leverages the best aspects of traditional software engineering techniques and provides an iterative, scalable, precise and efficient solution for modern database transformations. The central piece of the approach is the integration of a fine-tuned Large Language Model to address critical issues in SQL code conversion, such as syntax mapping, resolving discrepancies between Oracle PL/SQL and PostgreSQL, and optimising database elements such as stored procedures, triggers, views, and overall database logic. Thus, the method involves a trade-off between fine-tuning and prompt engineering. Special attention is given to a fine-tuning approach, which enhances the adaptability and compatibility with migration requirements across the entire database. According to the achieved results, fine-tuning plays a very important role. The study employs targeted evaluation methodologies along with computational metrics to measure the success of iterative conversion cycles. Core innovations include automated SQL feature detection, semi-supervised error analysis and integration of Subject Matter Experts feedback within a systematic migration workflow. The methodology achieves significant reductions in Syntax Error Rates, enhances feature alignment throughout migration iterations, and leverages dataset sampling to ensure continual improvement. By embedding GAI into the migration process, the framework facilitates precise feature mapping, semi-automated error resolution, and data-driven optimisation loops, improving workflow efficiency.
Abstract (translated)
该研究展示了在自动化代码库迁移领域的研究成果和实验验证,重点解决基于SQL系统的过渡挑战。提出的迁移方法本质上是一种框架,它借鉴了传统软件工程技术的最佳方面,并为现代数据库转换提供了一种迭代、可扩展、精确且高效的解决方案。此方法的核心在于整合了一个经过微调的大语言模型(Large Language Model),以解决SQL代码转换中的关键问题,如语法映射、解决Oracle PL/SQL和PostgreSQL之间的差异以及优化数据库元素,包括存储过程、触发器、视图等整体数据库逻辑。因此,该方法在微调与提示工程之间存在权衡。特别关注的是采用了一种精细的微调方法,以增强适应性和兼容性,使其能够满足整个数据库迁移需求的要求。根据所取得的结果,微调起着非常重要的作用。 研究采用了有针对性的评估方法和计算指标来衡量迭代转换周期的成功率。核心创新包括自动化的SQL特性检测、半监督错误分析以及在系统化迁移工作流程中整合主题专家(Subject Matter Experts)的意见反馈。该方法实现了语法错误率显著下降,增强了迁移过程中的特征对齐,并通过数据集抽样确保了持续改进。 通过将生成式人工智能(Generative AI, GAI)嵌入到迁移过程中,框架能够实现精确的特性映射、半自动化的错误解决以及基于数据驱动的优化循环,从而提高工作流程效率。
URL
https://arxiv.org/abs/2512.13515