Paper Reading AI Learner

Leveraging Large Language Models for Code Translation and Software Development in Scientific Computing

2024-10-31 16:48:41
Akash Dhruv, Anshu Dubey

Abstract

The emergence of foundational models and generative artificial intelligence (GenAI) is poised to transform productivity in scientific computing, especially in code development, refactoring, and translating from one programming language to another. However, because the output of GenAI cannot be guaranteed to be correct, manual intervention remains necessary. Some of this intervention can be automated through task-specific tools, alongside additional methodologies for correctness verification and effective prompt development. We explored the application of GenAI in assisting with code translation, language interoperability, and codebase inspection within a legacy Fortran codebase used to simulate particle interactions at the Large Hadron Collider (LHC). In the process, we developed a tool, CodeScribe, which combines prompt engineering with user supervision to establish an efficient process for code conversion. In this paper, we demonstrate how CodeScribe assists in converting Fortran code to C++, generating Fortran-C APIs for integrating legacy systems with modern C++ libraries, and providing developer support for code organization and algorithm implementation. We also address the challenges of AI-driven code translation and highlight its benefits for enhancing productivity in scientific computing workflows.

Abstract (translated)

基础模型和生成式人工智能(GenAI)的出现有望革新科学计算中的生产力,特别是在代码开发、重构以及不同编程语言之间的翻译方面。然而,由于GenAI输出的内容无法保证完全正确,因此仍需人工干预。其中一些干预可以通过特定任务的工具进行自动化,并通过额外的方法验证正确性和有效地生成提示来实现。我们探索了在辅助代码翻译、语言互操作性及检查遗留Fortran代码库中的应用程序,这些代码库用于模拟大型强子对撞机(LHC)中的粒子相互作用。在此过程中,我们开发了一个名为CodeScribe的工具,该工具结合了提示工程与用户监督,以建立一个高效的代码转换流程。在这篇论文中,我们展示了如何使用CodeScribe将Fortran代码转换为C++、生成用于集成遗留系统和现代C++库的Fortran-C API,并提供开发支持来帮助组织代码和实现算法。此外,我们也讨论了AI驱动代码翻译所面临的挑战,并强调其对提高科学计算工作流程生产力的好处。

URL

https://arxiv.org/abs/2410.24119

PDF

https://arxiv.org/pdf/2410.24119.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot