Benchmarking Large Language Model Volatility

Abstract
Abstract (translated)
URL
PDF

Abstract

The impact of non-deterministic outputs from Large Language Models (LLMs) is not well examined for financial text understanding tasks. Through a compelling case study on investing in the US equity market via news sentiment analysis, we uncover substantial variability in sentence-level sentiment classification results, underscoring the innate volatility of LLM outputs. These uncertainties cascade downstream, leading to more significant variations in portfolio construction and return. While tweaking the temperature parameter in the language model decoder presents a potential remedy, it comes at the expense of stifled creativity. Similarly, while ensembling multiple outputs mitigates the effect of volatile outputs, it demands a notable computational investment. This work furnishes practitioners with invaluable insights for adeptly navigating uncertainty in the integration of LLMs into financial decision-making, particularly in scenarios dictated by non-deterministic information.

Abstract (translated)

大语言模型（LLMs）非确定性输出的影响在金融文本理解任务中并没有得到很好的研究。通过一个引人入胜的案例研究，我们发现句子级情感分类结果的句子级情感存在很大的变异性，凸显了LLM输出的固有波动性。这些不确定性沿着下游传导，导致投资组合构建和回报的差异更加显著。虽然调整语言模型解码器的温度参数是一个潜在的解决方案，但以限制创造性为代价。同样，将多个输出进行集成可以减轻波动性输出的影响，但这需要明显的计算投入。这项工作为实践者提供了宝贵的经验，以便在将LLM集成到金融决策过程中更好地处理不确定性，尤其是在由非确定性信息决定的场景中。

URL

https://arxiv.org/abs/2311.15180

PDF

https://arxiv.org/pdf/2311.15180.pdf

Benchmarking Large Language Model Volatility

Abstract

Abstract (translated)

URL

PDF Copy

PDF