Paper Reading AI Learner

Can Foundational Large Language Models Assist with Conducting Pharmaceuticals Manufacturing Investigations?

2024-04-24 00:56:22
Hossein Salami (1), Brandye Smith-Goettler (2), Vijay Yadav (2) ((1) Digital Services, MMD, Merck & Co., Inc., Rahway, NJ, USA, (2) Digital Services, MMD, Merck & Co., Inc., West Point, PA, USA)

Abstract

General purpose Large Language Models (LLM) such as the Generative Pretrained Transformer (GPT) and Large Language Model Meta AI (LLaMA) have attracted much attention in recent years. There is strong evidence that these models can perform remarkably well in various natural language processing tasks. However, how to leverage them to approach domain-specific use cases and drive value remains an open question. In this work, we focus on a specific use case, pharmaceutical manufacturing investigations, and propose that leveraging historical records of manufacturing incidents and deviations in an organization can be beneficial for addressing and closing new cases, or de-risking new manufacturing campaigns. Using a small but diverse dataset of real manufacturing deviations selected from different product lines, we evaluate and quantify the power of three general purpose LLMs (GPT-3.5, GPT-4, and Claude-2) in performing tasks related to the above goal. In particular, (1) the ability of LLMs in automating the process of extracting specific information such as root cause of a case from unstructured data, as well as (2) the possibility of identifying similar or related deviations by performing semantic search on the database of historical records are examined. While our results point to the high accuracy of GPT-4 and Claude-2 in the information extraction task, we discuss cases of complex interplay between the apparent reasoning and hallucination behavior of LLMs as a risk factor. Furthermore, we show that semantic search on vector embedding of deviation descriptions can be used to identify similar records, such as those with a similar type of defect, with a high level of accuracy. We discuss further improvements to enhance the accuracy of similar record identification.

Abstract (translated)

近年来,通用大型语言模型(LLM)如生成预训练Transformer(GPT)和大语言模型元AI(LLaMA)引起了广泛关注。这些模型在各种自然语言处理任务中的表现确实非常出色。然而,如何将它们应用于领域特定应用场景并实现价值仍然是一个未解之谜。在这项工作中,我们关注一个具体的应用场景,即药品制造业调查,并提出利用组织历史记录的制造事件和偏差有益于解决和关闭新案件,或降低新生产活动的风险。使用来自不同产品线的真实制造偏差的小而多样的数据集,我们评估并量化三种通用LLM(GPT-3.5,GPT-4和Claude-2)在执行与上述目标相关的任务的功率。 特别是,我们检查了LLM在提取特定信息,如案件根原因,以及通过数据库执行语义搜索来识别类似或相关偏差的可能性。虽然我们的结果表明GPT-4和Claude-2在信息提取任务中的高准确性,但讨论了LLM似乎推理和幻觉行为的复杂相互作用作为风险因素。此外,我们还证明了通过向偏差描述的向量嵌入进行语义搜索可以用来识别具有相似类型的缺陷的类似记录,具有很高的准确性。我们进一步讨论了提高类似记录识别准确性的改进措施。

URL

https://arxiv.org/abs/2404.15578

PDF

https://arxiv.org/pdf/2404.15578.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot