Can Foundational Large Language Models Assist with Conducting Pharmaceuticals Manufacturing Investigations?

Abstract
Abstract (translated)
URL
PDF

Abstract

General purpose Large Language Models (LLM) such as the Generative Pretrained Transformer (GPT) and Large Language Model Meta AI (LLaMA) have attracted much attention in recent years. There is strong evidence that these models can perform remarkably well in various natural language processing tasks. However, how to leverage them to approach domain-specific use cases and drive value remains an open question. In this work, we focus on a specific use case, pharmaceutical manufacturing investigations, and propose that leveraging historical records of manufacturing incidents and deviations in an organization can be beneficial for addressing and closing new cases, or de-risking new manufacturing campaigns. Using a small but diverse dataset of real manufacturing deviations selected from different product lines, we evaluate and quantify the power of three general purpose LLMs (GPT-3.5, GPT-4, and Claude-2) in performing tasks related to the above goal. In particular, (1) the ability of LLMs in automating the process of extracting specific information such as root cause of a case from unstructured data, as well as (2) the possibility of identifying similar or related deviations by performing semantic search on the database of historical records are examined. While our results point to the high accuracy of GPT-4 and Claude-2 in the information extraction task, we discuss cases of complex interplay between the apparent reasoning and hallucination behavior of LLMs as a risk factor. Furthermore, we show that semantic search on vector embedding of deviation descriptions can be used to identify similar records, such as those with a similar type of defect, with a high level of accuracy. We discuss further improvements to enhance the accuracy of similar record identification.

Abstract (translated)

近年来，通用大型语言模型（LLM）如生成预训练Transformer（GPT）和大语言模型元AI（LLaMA）引起了广泛关注。这些模型在各种自然语言处理任务中的表现确实非常出色。然而，如何将它们应用于领域特定应用场景并实现价值仍然是一个未解之谜。在这项工作中，我们关注一个具体的应用场景，即药品制造业调查，并提出利用组织历史记录的制造事件和偏差有益于解决和关闭新案件，或降低新生产活动的风险。使用来自不同产品线的真实制造偏差的小而多样的数据集，我们评估并量化三种通用LLM（GPT-3.5，GPT-4和Claude-2）在执行与上述目标相关的任务的功率。特别是，我们检查了LLM在提取特定信息，如案件根原因，以及通过数据库执行语义搜索来识别类似或相关偏差的可能性。虽然我们的结果表明GPT-4和Claude-2在信息提取任务中的高准确性，但讨论了LLM似乎推理和幻觉行为的复杂相互作用作为风险因素。此外，我们还证明了通过向偏差描述的向量嵌入进行语义搜索可以用来识别具有相似类型的缺陷的类似记录，具有很高的准确性。我们进一步讨论了提高类似记录识别准确性的改进措施。

URL

https://arxiv.org/abs/2404.15578

PDF

https://arxiv.org/pdf/2404.15578.pdf

Can Foundational Large Language Models Assist with Conducting Pharmaceuticals Manufacturing Investigations?

Abstract

Abstract (translated)

URL

PDF Copy

PDF