Evaluating the Application of ChatGPT in Outpatient Triage Guidance: A Comparative Study

Abstract
Abstract (translated)
URL
PDF

Abstract

The integration of Artificial Intelligence (AI) in healthcare presents a transformative potential for enhancing operational efficiency and health outcomes. Large Language Models (LLMs), such as ChatGPT, have shown their capabilities in supporting medical decision-making. Embedding LLMs in medical systems is becoming a promising trend in healthcare development. The potential of ChatGPT to address the triage problem in emergency departments has been examined, while few studies have explored its application in outpatient departments. With a focus on streamlining workflows and enhancing efficiency for outpatient triage, this study specifically aims to evaluate the consistency of responses provided by ChatGPT in outpatient guidance, including both within-version response analysis and between-version comparisons. For within-version, the results indicate that the internal response consistency for ChatGPT-4.0 is significantly higher than ChatGPT-3.5 (p=0.03) and both have a moderate consistency (71.2% for 4.0 and 59.6% for 3.5) in their top recommendation. However, the between-version consistency is relatively low (mean consistency score=1.43/3, median=1), indicating few recommendations match between the two versions. Also, only 50% top recommendations match perfectly in the comparisons. Interestingly, ChatGPT-3.5 responses are more likely to be complete than those from ChatGPT-4.0 (p=0.02), suggesting possible differences in information processing and response generation between the two versions. The findings offer insights into AI-assisted outpatient operations, while also facilitating the exploration of potentials and limitations of LLMs in healthcare utilization. Future research may focus on carefully optimizing LLMs and AI integration in healthcare systems based on ergonomic and human factors principles, precisely aligning with the specific needs of effective outpatient triage.

Abstract (translated)

人工智能（AI）在医疗领域的应用具有提高运营效率和健康状况的变革潜力。大型语言模型（LLMs）如ChatGPT，已经在支持医疗决策方面展现出其能力。将LLMs嵌入医疗系统已成为医疗发展中的一个有前景的趋势。本文重点探讨了ChatGPT在急诊科分诊方面的应用潜力，而很少有研究探讨其在门诊部的应用。本文旨在评估ChatGPT在门诊指导中的回答一致性，包括内翻响应分析和跨版本比较。在內翻响应方面，结果显示ChatGPT-4.0的内部响应一致性显著高于ChatGPT-3.5（p=0.03），且两者在最高建议方面具有相似的稳健性（40.8% for 4.0 and 59.6% for 3.5）。然而，跨版本一致性相对较低（平均一致性分数=1.43/3，中位数=1），表明两个版本之间很少匹配。此外，只有50%的顶级建议在比较中完全匹配。有趣的是，ChatGPT-3.5的回答更有可能完整性较高（p=0.02），这可能表明两个版本之间在信息处理和响应生成方面的差异。这些发现为人工智能辅助医疗操作提供了洞见，同时也为探讨LLM在医疗利用中的潜力和局限性提供了便利。未来的研究可能会根据人机工程和人类因素原则，仔细优化LLMs和AI在医疗系统中的应用，并精确地把握有效门诊分诊的具体需求。

URL

https://arxiv.org/abs/2405.00728

PDF

https://arxiv.org/pdf/2405.00728.pdf

Evaluating the Application of ChatGPT in Outpatient Triage Guidance: A Comparative Study

Abstract

Abstract (translated)

URL

PDF Copy

PDF