Abstract
The emergence of foundational models and generative artificial intelligence (GenAI) is poised to transform productivity in scientific computing, especially in code development, refactoring, and translating from one programming language to another. However, because the output of GenAI cannot be guaranteed to be correct, manual intervention remains necessary. Some of this intervention can be automated through task-specific tools, alongside additional methodologies for correctness verification and effective prompt development. We explored the application of GenAI in assisting with code translation, language interoperability, and codebase inspection within a legacy Fortran codebase used to simulate particle interactions at the Large Hadron Collider (LHC). In the process, we developed a tool, CodeScribe, which combines prompt engineering with user supervision to establish an efficient process for code conversion. In this paper, we demonstrate how CodeScribe assists in converting Fortran code to C++, generating Fortran-C APIs for integrating legacy systems with modern C++ libraries, and providing developer support for code organization and algorithm implementation. We also address the challenges of AI-driven code translation and highlight its benefits for enhancing productivity in scientific computing workflows.
Abstract (translated)
基础模型和生成式人工智能(GenAI)的出现有望革新科学计算中的生产力,特别是在代码开发、重构以及不同编程语言之间的翻译方面。然而,由于GenAI输出的内容无法保证完全正确,因此仍需人工干预。其中一些干预可以通过特定任务的工具进行自动化,并通过额外的方法验证正确性和有效地生成提示来实现。我们探索了在辅助代码翻译、语言互操作性及检查遗留Fortran代码库中的应用程序,这些代码库用于模拟大型强子对撞机(LHC)中的粒子相互作用。在此过程中,我们开发了一个名为CodeScribe的工具,该工具结合了提示工程与用户监督,以建立一个高效的代码转换流程。在这篇论文中,我们展示了如何使用CodeScribe将Fortran代码转换为C++、生成用于集成遗留系统和现代C++库的Fortran-C API,并提供开发支持来帮助组织代码和实现算法。此外,我们也讨论了AI驱动代码翻译所面临的挑战,并强调其对提高科学计算工作流程生产力的好处。
URL
https://arxiv.org/abs/2410.24119