Paper Reading AI Learner

Pretraining and Updating Language- and Domain-specific Large Language Model: A Case Study in Japanese Business Domain

2024-04-12 06:21:48
Kosuke Takahashi, Takahiro Omi, Kosuke Arima, Tatsuya Ishigaki

Abstract

Several previous studies have considered language- and domain-specific large language models (LLMs) as separate topics. This study explores the combination of a non-English language and a high-demand industry domain, focusing on a Japanese business-specific LLM. This type of a model requires expertise in the business domain, strong language skills, and regular updates of its knowledge. We trained a 13-billion-parameter LLM from scratch using a new dataset of business texts and patents, and continually pretrained it with the latest business documents. Further we propose a new benchmark for Japanese business domain question answering (QA) and evaluate our models on it. The results show that our pretrained model improves QA accuracy without losing general knowledge, and that continual pretraining enhances adaptation to new information. Our pretrained model and business domain benchmark are publicly available.

Abstract (translated)

之前的研究将语言和领域特定的 large language models (LLMs) 视为单独的主题。本研究探讨了将非英语语言和高需求行业领域相结合,重点关注日语商务特定 LLM 的组合。这种模型需要掌握业务领域专业知识、强大的语言技能和对知识的定期更新。我们从头训练了一个包含 130 亿参数的 LLM,并不断用最新的商务文件预热它。此外,我们还为日本商务领域问题回答 (QA) 提出了一个新的基准,并评估了我们的模型在它上的表现。研究结果表明,我们的预训练模型在没有失去一般知识的情况下提高了 QA 准确性,而持续预训练则增强了对新信息的适应。我们的预训练模型和商务领域基准都是公开可用的。

URL

https://arxiv.org/abs/2404.08262

PDF

https://arxiv.org/pdf/2404.08262.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot