Paper Reading AI Learner

Exploring Distributional Shifts in Large Language Models for Code Analysis

2023-03-16 07:45:46
Shushan Arakelyan, Rocktim Jyoti Das, Yi Mao, Xiang Ren

Abstract

We systematically study the capacity of two large language models for code - CodeT5 and Codex - to generalize to out-of-domain data. In this study, we consider two fundamental applications - code summarization, and code generation. We split data into domains following its natural boundaries - by an organization, by a project, and by a module within the software project. This makes recognition of in-domain vs out-of-domain data at the time of deployment trivial. We establish that samples from each new domain present both models with a significant challenge of distribution shift. We study how well different established methods can adapt models to better generalize to new domains. Our experiments show that while multitask learning alone is a reasonable baseline, combining it with few-shot finetuning on examples retrieved from training data can achieve very strong performance. In fact, according to our experiments, this solution can outperform direct finetuning for very low-data scenarios. Finally, we consider variations of this approach to create a more broadly applicable method to adapt to multiple domains at once. We find that in the case of code generation, a model adapted to multiple domains simultaneously performs on par with those adapted to each domain individually.

Abstract (translated)

我们系统研究了两个大型语言模型 CodeT5 和 Codex 对跨域数据 generalization 的能力。在这项研究中,我们考虑了两个基本应用:代码摘要和代码生成。我们将数据按照其自然边界——由组织、项目和软件项目中模块——分阶段划分到不同的领域。这使得在部署时识别跨域数据和跨域数据是非常简单的。我们确定每个新领域的样本都同时给两个模型带来了分布 shift 的重大挑战。我们研究不同建立的方法如何适应模型更好地泛化到新领域。我们的实验表明,虽然多任务学习单独是一种合理的基准,但将它在从训练数据中提取的示例上的少量微调相结合可以实现非常强大的性能。事实上,根据我们的实验, this 解决方案可以在非常低数据量的情况下优于直接微调。最后,我们考虑这种方法的变体,以创建一种更普遍适用的方法,一次性适应多个领域。我们发现,在代码生成的情况下,适应多个领域的模型同时与适应每个领域的模型表现相当。

URL

https://arxiv.org/abs/2303.09128

PDF

https://arxiv.org/pdf/2303.09128.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot