Paper Reading AI Learner

Fine-tuned LLM-based Code Migration Framework

2025-12-15 16:42:51
Oleg Grynets, Vasyl Lyashkevych, Dmytro Baran, Maksym Orliansky, Taras Zelenyy, Markiian Leshchyshyn

Abstract

The study presents the outcomes of research and experimental validation in the domain of automated codebase migration, with a focus on addressing challenges in transitioning SQL-based systems. The proposed method for migration essentially appears as a framework that leverages the best aspects of traditional software engineering techniques and provides an iterative, scalable, precise and efficient solution for modern database transformations. The central piece of the approach is the integration of a fine-tuned Large Language Model to address critical issues in SQL code conversion, such as syntax mapping, resolving discrepancies between Oracle PL/SQL and PostgreSQL, and optimising database elements such as stored procedures, triggers, views, and overall database logic. Thus, the method involves a trade-off between fine-tuning and prompt engineering. Special attention is given to a fine-tuning approach, which enhances the adaptability and compatibility with migration requirements across the entire database. According to the achieved results, fine-tuning plays a very important role. The study employs targeted evaluation methodologies along with computational metrics to measure the success of iterative conversion cycles. Core innovations include automated SQL feature detection, semi-supervised error analysis and integration of Subject Matter Experts feedback within a systematic migration workflow. The methodology achieves significant reductions in Syntax Error Rates, enhances feature alignment throughout migration iterations, and leverages dataset sampling to ensure continual improvement. By embedding GAI into the migration process, the framework facilitates precise feature mapping, semi-automated error resolution, and data-driven optimisation loops, improving workflow efficiency.

Abstract (translated)

该研究展示了在自动化代码库迁移领域的研究成果和实验验证,重点解决基于SQL系统的过渡挑战。提出的迁移方法本质上是一种框架,它借鉴了传统软件工程技术的最佳方面,并为现代数据库转换提供了一种迭代、可扩展、精确且高效的解决方案。此方法的核心在于整合了一个经过微调的大语言模型(Large Language Model),以解决SQL代码转换中的关键问题,如语法映射、解决Oracle PL/SQL和PostgreSQL之间的差异以及优化数据库元素,包括存储过程、触发器、视图等整体数据库逻辑。因此,该方法在微调与提示工程之间存在权衡。特别关注的是采用了一种精细的微调方法,以增强适应性和兼容性,使其能够满足整个数据库迁移需求的要求。根据所取得的结果,微调起着非常重要的作用。 研究采用了有针对性的评估方法和计算指标来衡量迭代转换周期的成功率。核心创新包括自动化的SQL特性检测、半监督错误分析以及在系统化迁移工作流程中整合主题专家(Subject Matter Experts)的意见反馈。该方法实现了语法错误率显著下降,增强了迁移过程中的特征对齐,并通过数据集抽样确保了持续改进。 通过将生成式人工智能(Generative AI, GAI)嵌入到迁移过程中,框架能够实现精确的特性映射、半自动化的错误解决以及基于数据驱动的优化循环,从而提高工作流程效率。

URL

https://arxiv.org/abs/2512.13515

PDF

https://arxiv.org/pdf/2512.13515.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot