Evaluating Large Language Models on Time Series Feature Understanding: A Comprehensive Taxonomy and Benchmark

Abstract
Abstract (translated)
URL
PDF

Abstract

Large Language Models (LLMs) offer the potential for automatic time series analysis and reporting, which is a critical task across many domains, spanning healthcare, finance, climate, energy, and many more. In this paper, we propose a framework for rigorously evaluating the capabilities of LLMs on time series understanding, encompassing both univariate and multivariate forms. We introduce a comprehensive taxonomy of time series features, a critical framework that delineates various characteristics inherent in time series data. Leveraging this taxonomy, we have systematically designed and synthesized a diverse dataset of time series, embodying the different outlined features. This dataset acts as a solid foundation for assessing the proficiency of LLMs in comprehending time series. Our experiments shed light on the strengths and limitations of state-of-the-art LLMs in time series understanding, revealing which features these models readily comprehend effectively and where they falter. In addition, we uncover the sensitivity of LLMs to factors including the formatting of the data, the position of points queried within a series and the overall time series length.

Abstract (translated)

大语言模型（LLMs）具有自动时间序列分析和报告的潜力，这在许多领域都至关重要，跨越医疗、金融、气候、能源等。在本文中，我们提出了一个框架，旨在严格评估LLMs在时间序列理解方面的能力，包括单变量和多变量形式。我们引入了一个全面的时间序列特征分类器，这是一个界定时间序列数据各种特征的关键框架。利用这个分类器，我们系统地设计并合成了一组时间序列数据，体现了不同所描述的特征。这个数据集成为评估LLMs在理解时间序列方面的熟练程度的坚实基础。我们的实验揭示了LLMs在时间序列理解方面的优势和局限性，揭示了这些模型能否很好理解这些特征以及它们在哪些方面失误。此外，我们还揭示了LLMs对数据格式、系列中点的位置和整个时间序列长度的敏感性。

URL

https://arxiv.org/abs/2404.16563

PDF

https://arxiv.org/pdf/2404.16563.pdf

Evaluating Large Language Models on Time Series Feature Understanding: A Comprehensive Taxonomy and Benchmark

Abstract

Abstract (translated)

URL

PDF Copy

PDF