Abstract
In this paper, we present a dynamic semantic clustering approach inspired by the Chinese Restaurant Process, aimed at addressing uncertainty in the inference of Large Language Models (LLMs). We quantify uncertainty of an LLM on a given query by calculating entropy of the generated semantic clusters. Further, we propose leveraging the (negative) likelihood of these clusters as the (non)conformity score within Conformal Prediction framework, allowing the model to predict a set of responses instead of a single output, thereby accounting for uncertainty in its predictions. We demonstrate the effectiveness of our uncertainty quantification (UQ) technique on two well known question answering benchmarks, COQA and TriviaQA, utilizing two LLMs, Llama2 and Mistral. Our approach achieves SOTA performance in UQ, as assessed by metrics such as AUROC, AUARC, and AURAC. The proposed conformal predictor is also shown to produce smaller prediction sets while maintaining the same probabilistic guarantee of including the correct response, in comparison to existing SOTA conformal prediction baseline.
Abstract (translated)
在这篇论文中,我们提出了一种受中国餐馆过程启发的动态语义聚类方法,旨在解决大型语言模型(LLMs)推理中的不确定性问题。我们通过计算生成的语义簇的熵来量化给定查询下LLM的不确定性。此外,我们建议利用这些簇的(负)可能性作为符合性预测框架内的(非)一致性评分,使模型能够预测一组响应而不是单一输出,从而考虑其预测中的不确定性。我们在两个著名的问答基准测试COQA和TriviaQA上展示了我们的不确定性量化(UQ)技术的有效性,并使用了两种大型语言模型Llama2和Mistral。我们的方法在UQ方面达到了最先进的性能,这通过AUROC、AUARC和AURAC等指标进行了评估。我们提出的符合性预测器也被证明能够生成更小的预测集,同时保持相同的概率保证来包含正确响应,与现有的最先进符合性预测基线相比。
URL
https://arxiv.org/abs/2411.02381