Abstract
Transformers have recently gained prominence in long time series forecasting by elevating accuracies in a variety of use cases. Regrettably, in the race for better predictive performance the overhead of model architectures has grown onerous, leading to models with computational demand infeasible for most practical applications. To bridge the gap between high method complexity and realistic computational resources, we introduce the Residual Cyclic Transformer, ReCycle. ReCycle utilizes primary cycle compression to address the computational complexity of the attention mechanism in long time series. By learning residuals from refined smoothing average techniques, ReCycle surpasses state-of-the-art accuracy in a variety of application use cases. The reliable and explainable fallback behavior ensured by simple, yet robust, smoothing average techniques additionally lowers the barrier for user acceptance. At the same time, our approach reduces the run time and energy consumption by more than an order of magnitude, making both training and inference feasible on low-performance, low-power and edge computing devices. Code is available at this https URL
Abstract (translated)
近年来,Transformer 在长时序列预测中因提高各种用例中的准确性而取得了突出地位。然而,为了在预测性能的竞争中获得更好的表现,模型架构的复杂性不断提高,导致大多数实际应用模型具有计算密集型,无法满足实际计算资源的需求。为了弥合高方法复杂性和现实计算资源之间的差距,我们引入了 Residual Cyclic Transformer (ReCycle)。ReCycle 通过主要循环压缩来解决长时序列中注意力机制的计算复杂性。通过从精细平滑平均技术中学习残差,ReCycle 在各种应用用例中超越了最先进的准确率。由简单而强大的平滑平均技术确保的可靠且可解释的退火行为还进一步降低了用户接受度的门槛。同时,我们的方法将运行时间和能源消耗降低了 orders of magnitude,使得在低性能、低功率和边缘计算设备上训练和推理都成为可能。代码位于此链接:
URL
https://arxiv.org/abs/2405.03429