Abstract
Accurately forecasting long-term atmospheric variables remains a defining challenge in meteorological science due to the chaotic nature of atmospheric systems. Temperature data represents a complex superposition of deterministic cyclical climate forces and stochastic, short-term fluctuations. While planetary mechanics drive predictable seasonal periodicities, rapid meteorological changes such as thermal variations, pressure anomalies, and humidity shifts introduce nonlinear volatilities that defy simple extrapolation. Historically, the Seasonal Autoregressive Integrated Moving Average (SARIMA) model has been the standard for modeling historical weather data, prized for capturing linear seasonal trends. However, SARIMA operates under strict assumptions of stationarity, failing to capture abrupt, nonlinear transitions. This leads to systematic residual errors, manifesting as the under-prediction of sudden spikes or the over-smoothing of declines. Conversely, Deep Learning paradigms, specifically Long Short-Term Memory (LSTM) networks, demonstrate exceptional efficacy in handling intricate time-series data. By utilizing memory gates, LSTMs learn complex nonlinear dependencies. Yet, LSTMs face instability in open-loop forecasting; without ground truth feedback, minor deviations compound recursively, causing divergence. To resolve these limitations, we propose a Hybrid SARIMA-LSTM architecture. This framework employs a residual-learning strategy to decompose temperature into a predictable climate component and a nonlinear weather component. The SARIMA unit models the robust, long-term seasonal trend, while the LSTM is trained exclusively on the residuals the nonlinear errors SARIMA fails to capture. By fusing statistical stability with neural plasticity, this hybrid approach minimizes error propagation and enhances long-horizon accuracy.
Abstract (translated)
准确预测长期大气变量仍然是气象科学面临的重大挑战,因为大气系统具有混沌特性。温度数据代表了确定性的周期性气候力与随机短期波动的复杂叠加。行星力学驱动可预测的季节性周期性变化,而诸如热变、气压异常和湿度波动等快速气象变化引入了非线性不稳定性,难以简单外推。 历史上,季节数字自回归积分移动平均模型(SARIMA)一直是用于建模历史天气数据的标准方法,因其能够捕捉线性的季节趋势而备受推崇。然而,SARIMA基于平稳性的严格假设运行,在处理突发、非线性转变时表现不佳。这导致了系统性的残差误差,表现为对突然峰值的低估或对下降趋势的过度平滑。 相比之下,深度学习范式——尤其是长短期记忆网络(LSTM)——在处理复杂的时间序列数据方面表现出非凡的有效性。通过使用记忆门,LSTMs能够学习复杂的非线性依赖关系。然而,在没有真实反馈的情况下进行开环预测时,LSTMs面临不稳定性;即使是很小的偏差也会递归地累积,导致模型发散。 为解决这些限制,我们提出了一种混合SARIMA-LSTM架构。这种框架采用残差学习策略,将温度分解成一个可预测的气候成分和一个非线性天气成分。SARIMA单元建模稳健的长期季节趋势,而LSTM仅针对SARIMA未能捕捉到的非线性误差进行训练。通过结合统计稳定性和神经网络的灵活性,这种混合方法可以最小化误差传播,并提高长时期的预测准确性。
URL
https://arxiv.org/abs/2601.07951