Paper Reading AI Learner

Assessing GPT-4-Vision's Capabilities in UML-Based Code Generation

2024-04-22 17:21:24
Gábor Antal, Richárd Vozár, Rudolf Ferenc

Abstract

The emergence of advanced neural networks has opened up new ways in automated code generation from conceptual models, promising to enhance software development processes. This paper presents a preliminary evaluation of GPT-4-Vision, a state-of-the-art deep learning model, and its capabilities in transforming Unified Modeling Language (UML) class diagrams into fully operating Java class files. In our study, we used exported images of 18 class diagrams comprising 10 single-class and 8 multi-class diagrams. We used 3 different prompts for each input, and we manually evaluated the results. We created a scoring system in which we scored the occurrence of elements found in the diagram within the source code. On average, the model was able to generate source code for 88% of the elements shown in the diagrams. Our results indicate that GPT-4-Vision exhibits proficiency in handling single-class UML diagrams, successfully transforming them into syntactically correct class files. However, for multi-class UML diagrams, the model's performance is weaker compared to single-class diagrams. In summary, further investigations are necessary to exploit the model's potential completely.

Abstract (translated)

高级神经网络的出现为自动编程从概念模型中开辟了新的途径,并承诺提高软件开发流程。本文对GPT-4-Vision进行了初步评估,该模型是当前最先进的深度学习模型,并探讨了其在将统一建模语言(UML)类图转换为完全操作的Java类文件时的能力。在我们的研究中,我们使用了18个类图的导出图像,包括10个单类和8个多类类图。我们对每个输入使用了3个提示,并手动评估了结果。我们创建了一个评分系统,在其中给源代码中找到的元素打分。平均情况下,该模型能够生成源代码的88%的元素。我们的研究结果表明,GPT-4-Vision在处理单类UML类图方面表现出熟练,成功地将它们转换为语义正确的类文件。然而,对于多类UML类图,模型的性能较弱。总之,进一步的研究是必要的,以充分利用该模型的潜力。

URL

https://arxiv.org/abs/2404.14370

PDF

https://arxiv.org/pdf/2404.14370.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot