Abstract
Diffusion Transformers (DiT) have achieved milestones in synthesizing financial time-series data, such as stock prices and order flows. However, their performance in synthesizing treasury futures data is still underexplored. This work emphasizes the characteristics of treasury futures data, including its low volume, market dependencies, and the grouped correlations among multivariables. To overcome these challenges, we propose TF-CoDiT, the first DiT framework for language-controlled treasury futures synthesis. To facilitate low-data learning, TF-CoDiT adapts the standard DiT by transforming multi-channel 1-D time series into Discrete Wavelet Transform (DWT) coefficient matrices. A U-shape VAE is proposed to encode cross-channel dependencies hierarchically into a latent variable and bridge the latent and DWT spaces through decoding, thereby enabling latent diffusion generation. To derive prompts that cover essential conditions, we introduce the Financial Market Attribute Protocol (FinMAP) - a multi-level description system that standardizes daily$/$periodical market dynamics by recognizing 17$/$23 economic indicators from 7/8 perspectives. In our experiments, we gather four types of treasury futures data covering the period from 2015 to 2025, and define data synthesis tasks with durations ranging from one week to four months. Extensive evaluations demonstrate that TF-CoDiT can produce highly authentic data with errors at most 0.433 (MSE) and 0.453 (MAE) to the ground-truth. Further studies evidence the robustness of TF-CoDiT across contracts and temporal horizons.
Abstract (translated)
扩散变换器(Diffusion Transformers,简称DiT)在合成股票价格和订单流等金融时间序列数据方面取得了重要成就。然而,在合成国债期货数据方面的表现仍鲜有研究。本工作着重于国债期货数据的特点,包括其低交易量、市场依赖性和多变量之间的分组相关性。为了克服这些挑战,我们提出了TF-CoDiT——第一个用于语言控制的国债期货合成的DiT框架。 为促进在低数据环境下的学习能力,TF-CoDiT通过将多通道1-D时间序列转换为离散小波变换(DWT)系数矩阵来调整标准DiT。此外,本工作提出了一种U形变量子系统,用于分层次编码跨通道依赖性到潜在变量中,并通过解码过程在潜在空间和DWT空间之间架起桥梁,从而实现潜在扩散生成。 为了推导覆盖关键条件的提示,我们引入了金融市场属性协议(Financial Market Attribute Protocol,简称FinMAP)——这是一个多级描述系统,通过对7/8个视角中的17/23种经济指标进行识别来标准化每日和周期性的市场动态变化。 在实验中,我们收集了从2015年到2025年的四类国债期货数据,并定义了持续时间从一周至四个月的数据合成任务。广泛的评估显示,TF-CoDiT可以生成与真实情况误差不超过0.433(均方差)和0.453(平均绝对误差)的高度逼真数据。进一步的研究证明了TF-CoDiT在不同合同及时间范围内的鲁棒性。
URL
https://arxiv.org/abs/2601.11880