While deep learning has excelled in various domains, its application to sequential decision-making in finance remains challenging due to the low Signal-to-Noise Ratio (SNR) and non-stationarity of financial data. Leveraging the reasoning capabilities of Large Language Models (LLMs), we propose \textbf{PandaAI}, a closed-loop neuro-symbolic LLM agent with market regime modeling and constrained alpha generation, which bridges general LLM reasoning with financial rigor and suppresses the financial toxicity of LLM-generated outputs. To bridge the gap between general linguistic capability and financial rigor, we fine-tune a domain-specific LLM. Furthermore, we integrate this LLM into a modular architecture and form a closed-loop system. Unlike traditional models that optimize isolated prediction metrics, \textbf{PandaAI} is designed as a neuro-symbolic agent that navigates the complex, real-world financial environment with explicit risk awareness. Extensive experiments on CSI 300 stock data show that \textbf{PandaAI} achieves a $18.2\%$ higher Rank IC and $25.7\%$ lower maximum drawdown than state-of-the-art time-series models. Our constrained LLM generation and dual-channel adaptation method provide a general paradigm for LLM deployment in high-stakes sequential decision-making scenarios.
https://arxiv.org/abs/2606.06823
We explore the application of LLM-driven algorithm optimization to several common tasks in quantitative finance. MadEvolve, a general-purpose algorithm optimization framework inspired by DeepMind's Alpha-Evolve, was recently developed to optimize algorithms in computational cosmology. Here we demonstrate the utility of MadEvolve to optimize algorithmic trading strategies and alpha generation at the example of Bitcoin trading. On our simulation and backtesting setup, we achieve significant improvements on all tasks we considered, such as evolving feature sets for signal generation, optimizing separate components of the trading strategy, and jointly evolving the feature pipeline together with the execution strategy. Additionally, we compare our method to other agentic search approaches, specifically Claude Code, and carefully evaluate p-hacking probabilities on our simulation setup. Our findings strongly support the utility of AI-driven agentic and evolutionary algorithms for algorithmic trading and quantitative finance.
https://arxiv.org/abs/2605.23007
Large language models (LLMs) are increasingly deployed in quantitative finance for stock price forecasting. This review synthesizes recent applications of LLMs in this domain, including extracting sentiment from financial news and social media, analyzing financial reports and earnings-call transcripts, tokenizing or symbolizing stock price series, and constructing multi-agent trading systems. Particular attention is paid to practical pitfalls that are often understated in the literature, such as fragility in sentiment analysis, dataset and horizon design, performance evaluation metrics, data leakage, illiquidity premia, and limits of stock price predictability. Organized from a hedge-fund perspective, the review is intended to guide both academic researchers and hedge fund managers in integrating LLMs into real-world trading pipelines and in stress-testing their robustness under realistic market frictions.
https://arxiv.org/abs/2605.05211
We present the Consilium Protocol, a Byzantine Fault Tolerance-derived architecture for structured multi-model AI deliberation that treats inter-model disagreement as epistemic signal rather than error. The protocol assigns engineered cognitive personas to language models -- separating what a model is from how it reasons -- and introduces an In-Sample/Out-of-Sample validation framework adapted from quantitative finance to distinguish training-data consensus from empirically grounded conclusions. Across 1,478 deliberation sessions spanning 32 topics in 10 domain categories, we demonstrate that (1) the cognitive persona, not the underlying model, determines epistemic behavior: free edge-inference models costing 0.0002 USD per batch produced comparable analytical output to frontier models costing 10.69 USD; (2) RLHF alignment training creates measurable, domain-specific epistemic blind spots -- contested policy topics exhibit 12.3 percentage points less adversarial challenge than settled science topics, and AI safety topics show asymmetric bias ($\Delta$=11.6%) where models challenge claims that AI is dangerous far more vigorously than claims that AI risk is overstated; (3) the protocol exhibits no directional bias of its own (immigration $\Delta$=2.3%, renewables $\Delta$=1.2%); and (4) out-of-sample evidence retrieval validated 239 claims with 100% evidence retrieval and surfaced 167 blind-spot discoveries invisible to training-data deliberation. Run-to-run reproducibility across randomized model$\times$persona assignments averages $\pm$2.2% standard deviation. Total cost for the complete battery including all overhead: 217 USD. We release the protocol specification under MIT license to enable independent verification.
https://arxiv.org/abs/2606.00005
Deep learning models in quantitative finance often operate as black boxes, lacking interpretability and failing to incorporate fundamental economic principles such as no-arbitrage constraints. This paper introduces ARTEMIS (Arbitrage-free Representation Through Economic Models and Interpretable Symbolics), a novel neuro-symbolic framework combining a continuous-time Laplace Neural Operator encoder, a neural stochastic differential equation regularised by physics-informed losses, and a differentiable symbolic bottleneck that distils interpretable trading rules. The model enforces economic plausibility via two novel regularisation terms: a Feynman-Kac PDE residual penalising local no-arbitrage violations, and a market price of risk penalty bounding the instantaneous Sharpe ratio. We evaluate ARTEMIS against six strong baselines on four datasets: Jane Street, Optiver, Time-IMM, and DSLOB (a synthetic crash regime). Results demonstrate ARTEMIS achieves state-of-the-art directional accuracy, outperforming all baselines on DSLOB (64.96%) and Time-IMM (96.0%). A comprehensive ablation study confirms each component's contribution: removing the PDE loss reduces directional accuracy from 64.89% to 50.32%. Underperformance on Optiver is attributed to its long sequence length and volatility-focused target. By providing interpretable, economically grounded predictions, ARTEMIS bridges the gap between deep learning's power and the transparency demanded in quantitative finance.
https://arxiv.org/abs/2603.18107
Discovering predictive alpha factors in quantitative finance remains a formidable challenge due to the vast combinatorial search space and inherently low signal-to-noise ratios in financial data. Existing automated methods, particularly genetic programming, often produce complex, uninterpretable formulas prone to overfitting. We introduce Hubble, a closed-loop factor mining framework that leverages Large Language Models (LLMs) as intelligent search heuristics, constrained by a domain-specific operator language and an Abstract Syntax Tree (AST)-based execution sandbox. The framework evaluates candidate factors through a rigorous statistical pipeline encompassing cross-sectional Rank Information Coefficient (RankIC), annualized Information Ratio, and portfolio turnover. An evolutionary feedback mechanism returns top-performing factors and structured error diagnostics to the LLM, enabling iterative refinement across multiple generation rounds. In experiments conducted on a panel of 30 U.S. equities over 752 trading days, the system evaluated 181 syntactically valid factors from 122 unique candidates across three rounds, achieving a peak composite score of 0.827 with 100% computational stability. Our results demonstrate that combining LLM-driven generation with deterministic safety constraints yields an effective, interpretable, and reproducible approach to automated factor discovery.
https://arxiv.org/abs/2604.09601
LLMs have demonstrated significant potential in quantitative finance by processing vast unstructured data to emulate human-like analytical workflows. However, current LLM-based methods primarily follow either an Asset-Centric paradigm focused on individual stock prediction or a Market-Centric approach for portfolio allocation, often remaining agnostic to the underlying reasoning that drives market movements. In this paper, we propose a Logic-Oriented perspective, modeling the financial market as a dynamic, evolutionary ecosystem of competing investment narratives, termed Modes of Thought. To operationalize this view, we introduce MEME (Modeling the Evolutionary Modes of Financial Markets), designed to reconstruct market dynamics through the lens of evolving logics. MEME employs a multi-agent extraction module to transform noisy data into high-fidelity Investment Arguments and utilizes Gaussian Mixture Modeling to uncover latent consensus within a semantic space. To model semantic drift among different market conditions, we also implement a temporal evaluation and alignment mechanism to track the lifecycle and historical profitability of these modes. By prioritizing enduring market wisdom over transient anomalies, MEME ensures that portfolio construction is guided by robust reasoning. Extensive experiments on three heterogeneous Chinese stock pools from 2023 to 2025 demonstrate that MEME consistently outperforms seven SOTA baselines. Further ablation studies, sensitivity analysis, lifecycle case study and cost analysis validate MEME's capacity to identify and adapt to the evolving consensus of financial markets. Our implementation can be found at this https URL.
大型语言模型(LLMs)在量化金融领域展示了显著的潜力,通过处理大量非结构化数据来模拟类似人类的分析工作流程。然而,目前基于LLM的方法主要遵循两种范式:一种是专注于个股预测的资产中心主义方法;另一种则是用于投资组合配置的市场中心主义方法,两者通常忽视了推动市场变动的根本原因。在本文中,我们提出了一种逻辑导向视角,将金融市场建模为一个动态、进化的竞争性投资叙事生态系统,称为思想模式(Modes of Thought)。为了实现这一观点,我们引入了MEME(Modeling the Evolutionary Modes of Financial Markets),旨在通过不断演化的逻辑来重建市场动态。MEME采用多代理提取模块将嘈杂的数据转换为高保真的投资论据,并使用高斯混合模型在语义空间内揭示潜在的共识。为了模拟不同市场条件下语义漂移,我们还实施了一种时间评估和对齐机制,以跟踪这些模式的生命历程及其历史盈利能力。通过优先考虑持久的市场智慧而非短暂异常,MEME确保投资组合构建由稳健的理由引导。 从2023年到2025年的三个异质中国股票池中进行的大量实验表明,MEME在七种最先进的基准方法上始终表现出色。进一步的消融研究、敏感性分析、生命周期案例研究和成本分析验证了MEME识别并适应金融市场不断演变共识的能力。 我们的实现可以在以下网址找到:[此处插入实际链接]
https://arxiv.org/abs/2602.11918
Extracting signals through alpha factor mining is a fundamental challenge in quantitative finance. Existing automated methods primarily follow two paradigms: Decoupled Factor Generation, which treats factor discovery as isolated events, and Iterative Factor Evolution, which focuses on local parent-child refinements. However, both paradigms lack a global structural view, often treating factor pools as unstructured collections or fragmented chains, which leads to redundant search and limited diversity. To address these limitations, we introduce AlphaPROBE (Alpha Mining via Principled Retrieval and On-graph Biased Evolution), a framework that reframes alpha mining as the strategic navigation of a Directed Acyclic Graph (DAG). By modeling factors as nodes and evolutionary links as edges, AlphaPROBE treats the factor pool as a dynamic, interconnected ecosystem. The framework consists of two core components: a Bayesian Factor Retriever that identifies high-potential seeds by balancing exploitation and exploration through a posterior probability model, and a DAG-aware Factor Generator that leverages the full ancestral trace of factors to produce context-aware, nonredundant optimizations. Extensive experiments on three major Chinese stock market datasets against 8 competitive baselines demonstrate that AlphaPROBE significantly gains enhanced performance in predictive accuracy, return stability and training efficiency. Our results confirm that leveraging global evolutionary topology is essential for efficient and robust automated alpha discovery. We have open-sourced our implementation at this https URL.
通过阿尔法因子挖掘提取信号是量化金融中的一个基本挑战。现有的自动化方法主要遵循两种范式:解耦因素生成,这种方法将因子发现视为孤立事件;以及迭代因素进化,侧重于局部的父子层次细化。然而,这两种范式都缺乏全局结构视角,往往将因子池视作无结构集合或碎片化链条,导致冗余搜索和多样性受限。 为了克服这些限制,我们引入了AlphaPROBE(通过原则性检索和图上偏置演化进行阿尔法挖掘),这是一个框架,它重新定义阿尔法挖掘为有向无环图(DAG)的战略导航。AlphaPROBE将因子视为节点,并将进化链接视作边,从而将因子池视为一个动态的、相互关联的生态系统。该框架由两个核心组件组成:贝叶斯因子检索器,通过后验概率模型平衡利用和探索来识别高潜力种子;以及DAG感知型因子生成器,它利用因素的完整先祖追踪以产生上下文相关且非冗余优化。 在三个主要中国股票市场数据集上进行的大量实验表明,AlphaPROBE相较于8个竞争基线,在预测准确性、收益稳定性和训练效率方面显著提升了性能。我们的研究结果证实了借助全局进化拓扑对于有效和鲁棒自动阿尔法发现的重要性。 我们已经开源了此实现,请访问[此处](https://URL)查看。
https://arxiv.org/abs/2602.11917
In quantitative finance, the gap between training and real-world performance-driven by concept drift and distributional non-stationarity-remains a critical obstacle for building reliable data-driven systems. Models trained on static historical data often overfit, resulting in poor generalization in dynamic markets. The mantra "History Is Not Enough" underscores the need for adaptive data generation that learns to evolve with the market rather than relying solely on past observations. We present a drift-aware dataflow system that integrates machine learning-based adaptive control into the data curation process. The system couples a parameterized data manipulation module comprising single-stock transformations, multi-stock mix-ups, and curation operations, with an adaptive planner-scheduler that employs gradient-based bi-level optimization to control the system. This design unifies data augmentation, curriculum learning, and data workflow management under a single differentiable framework, enabling provenance-aware replay and continuous data quality monitoring. Extensive experiments on forecasting and reinforcement learning trading tasks demonstrate that our framework enhances model robustness and improves risk-adjusted returns. The system provides a generalizable approach to adaptive data management and learning-guided workflow automation for financial data.
在量化金融领域,由于概念漂移(concept drift)和分布非平稳性(distributional non-stationarity),训练数据与实际世界性能之间的差距仍然是构建可靠的数据驱动系统的关键障碍。基于静态历史数据进行训练的模型往往过度拟合,在动态市场中表现不佳。口号“历史不够”强调了需要自适应数据生成,以学习随着市场变化而演变,而不是仅仅依赖于过去的观察结果。 我们提出了一种概念漂移感知的数据流系统,该系统将机器学习基础的自适应控制集成到了数据管理过程中。该系统结合了一个参数化的数据操作模块(包括单股票转换、多股票混合和数据管理操作)与一个采用基于梯度的双层优化方法进行自我调节的规划调度器。这种设计统一了数据增强、课程学习以及数据工作流管理在一个单一可微分框架内,使得来源追踪感知重放和持续的数据质量监控成为可能。 在预测任务和强化学习交易任务上的广泛实验表明,我们的框架能够提升模型鲁棒性并改善风险调整后的回报率。该系统为适应性数据管理和由学习引导的工作流程自动化提供了通用的方法论,适用于金融数据处理。
https://arxiv.org/abs/2601.10143
Large Language Models (LLMs) have shown strong capabilities across many domains, yet their evaluation in financial quantitative tasks remains fragmented and mostly limited to knowledge-centric question answering. We introduce QuantEval, a benchmark that evaluates LLMs across three essential dimensions of quantitative finance: knowledge-based QA, quantitative mathematical reasoning, and quantitative strategy coding. Unlike prior financial benchmarks, QuantEval integrates a CTA-style backtesting framework that executes model-generated strategies and evaluates them using financial performance metrics, enabling a more realistic assessment of quantitative coding ability. We evaluate some state-of-the-art open-source and proprietary LLMs and observe substantial gaps to human experts, particularly in reasoning and strategy coding. Finally, we conduct large-scale supervised fine-tuning and reinforcement learning experiments on domain-aligned data, demonstrating consistent improvements. We hope QuantEval will facilitate research on LLMs' quantitative finance capabilities and accelerate their practical adoption in real-world trading workflows. We additionally release the full deterministic backtesting configuration (asset universe, cost model, and metric definitions) to ensure strict reproducibility.
大型语言模型(LLMs)在多个领域展现了强大的能力,然而它们在金融量化任务中的评估仍然碎片化,并且主要局限于知识为中心的问题回答。我们引入了QuantEval基准测试,它从定量金融的三个方面来评价LLMs:基于知识的问答、数量化的数学推理以及量化的策略编码。 与之前的财务基准不同,QuantEval整合了一个CTA风格的回测框架,该框架可以执行模型生成的策略,并使用财务绩效指标进行评估,从而能够更真实地衡量量化代码编写能力。我们对一些最先进的开源和专有LLMs进行了评价,观察到在推理和策略编码方面与人类专家存在显著差距。 最后,我们在领域内对齐的数据上进行了大规模监督微调和强化学习实验,显示出了持续的改进效果。我们希望QuantEval能促进对LLMs量化金融能力的研究,并加速它们在现实世界交易工作流程中的实际应用。此外,为了确保严格的可重复性,我们将完整的确定性回测配置(资产组合、成本模型及指标定义)一并发布。
https://arxiv.org/abs/2601.08689
This paper investigates how Large Language Models (LLMs) from leading providers (OpenAI, Google, Anthropic, DeepSeek, and xAI) can be applied to quantitative sector-based portfolio construction. We use LLMs to identify investable universes of stocks within S&P 500 sector indices and evaluate how their selections perform when combined with classical portfolio optimization methods. Each model was prompted to select and weight 20 stocks per sector, and the resulting portfolios were compared with their respective sector indices across two distinct out-of-sample periods: a stable market phase (January-March 2025) and a volatile phase (April-June 2025). Our results reveal a strong temporal dependence in LLM portfolio performance. During stable market conditions, LLM-weighted portfolios frequently outperformed sector indices on both cumulative return and risk-adjusted (Sharpe ratio) measures. However, during the volatile period, many LLM portfolios underperformed, suggesting that current models may struggle to adapt to regime shifts or high-volatility environments underrepresented in their training data. Importantly, when LLM-based stock selection is combined with traditional optimization techniques, portfolio outcomes improve in both performance and consistency. This study contributes one of the first multi-model, cross-provider evaluations of generative AI algorithms in investment management. It highlights that while LLMs can effectively complement quantitative finance by enhancing stock selection and interpretability, their reliability remains market-dependent. The findings underscore the potential of hybrid AI-quantitative frameworks, integrating LLM reasoning with established optimization techniques, to produce more robust and adaptive investment strategies.
本文研究了领先供应商(OpenAI、Google、Anthropic、DeepSeek 和 xAI)提供的大型语言模型(LLMs)在基于量化行业的投资组合构建中的应用。我们使用这些模型来识别标普500指数中各行业成分股的投资范围,并评估它们的选择与经典投资组合理论方法结合后的表现。每个模型被提示选择并加权各行业中20只股票,然后我们将生成的投资组合与其他同类市场指数在两个不同的样本外时间段进行了比较:一个稳定的市场时期(2025年1月至3月)和一个动荡的市场时期(2025年4月至6月)。我们的研究结果揭示了LLM投资组合绩效具有明显的时变特性。在稳定市场的条件下,通过累积回报和风险调整后收益(夏普比率)衡量,LLM加权的投资组合常常优于行业指数表现。然而,在动荡的市场期间,许多由LLM构建的投资组合的表现不佳,这表明当前模型可能难以适应其训练数据中代表性不足的制度转变或高波动性环境。值得注意的是,当基于LLM的选择与传统优化技术相结合时,投资组合在性能和一致性方面都有所提升。 这项研究提供了对生成式AI算法在资产管理中的多模态、跨供应商评估的一个早期示例。研究表明,虽然LLMs可以通过增强选股能力和可解释性有效地补充量化金融,但其可靠性仍然依赖于市场条件。这些发现强调了混合AI-量化框架的潜力,即结合LLM推理和成熟的优化技术来生成更加稳健且适应性强的投资策略。
https://arxiv.org/abs/2512.24526
Recent advances in large language models (LLMs) are transforming data-intensive domains, with finance representing a high-stakes environment where transparent and reproducible analysis of heterogeneous signals is essential. Traditional quantitative methods remain vulnerable to survivorship bias, while many AI-driven approaches struggle with signal integration, reproducibility, and computational efficiency. We introduce MASFIN, a modular multi-agent framework that integrates LLMs with structured financial metrics and unstructured news, while embedding explicit bias-mitigation protocols. The system leverages GPT-4.1-nano for reproducability and cost-efficient inference and generates weekly portfolios of 15-30 equities with allocation weights optimized for short-term performance. In an eight-week evaluation, MASFIN delivered a 7.33% cumulative return, outperforming the S&P 500, NASDAQ-100, and Dow Jones benchmarks in six of eight weeks, albeit with higher volatility. These findings demonstrate the promise of bias-aware, generative AI frameworks for financial forecasting and highlight opportunities for modular multi-agent design to advance practical, transparent, and reproducible approaches in quantitative finance.
最近在大型语言模型(LLMs)方面取得的进展正在改变数据密集型领域,尤其是在金融行业这样一个高风险环境中,透明且可重复地分析异构信号至关重要。传统的量化方法仍然容易受到幸存者偏差的影响,而许多基于AI的方法则难以整合信号、确保可重复性和提高计算效率。我们推出了MASFIN,这是一个模块化的多代理框架,它将LLMs与结构化金融指标和非结构化新闻相结合,并嵌入了明确的偏见缓解协议。该系统利用GPT-4.1-nano来实现可重复性并进行成本效益高的推理,生成包含15至30个股票的每周投资组合,其分配权重经过优化以提高短期表现。在为期八周的评估中,MASFIN实现了7.33%的累计收益,在八个周期中有六次超过了标准普尔500指数、纳斯达克-100和道琼斯基准的表现,尽管波动性较高。这些发现展示了具有偏见意识的生成式AI框架在金融预测中的潜力,并强调了模块化多代理设计在量化金融中推进实用、透明和可重复方法的机会。
https://arxiv.org/abs/2512.21878
Synthetic financial data offers a practical way to address the privacy and accessibility challenges that limit research in quantitative finance. This paper examines the use of generative models, in particular TimeGAN and Variational Autoencoders (VAEs), for creating synthetic return series that support portfolio construction, trading analysis, and risk modeling. Using historical daily returns from the S and P 500 as a benchmark, we generate synthetic datasets under comparable market conditions and evaluate them using statistical similarity metrics, temporal structure tests, and downstream financial tasks. The study shows that TimeGAN produces synthetic data with distributional shapes, volatility patterns, and autocorrelation behaviour that are close to those observed in real returns. When applied to mean-variance portfolio optimization, the resulting synthetic datasets lead to portfolio weights, Sharpe ratios, and risk levels that remain close to those obtained from real data. The VAE provides more stable training but tends to smooth extreme market movements, which affects risk estimation. Finally, the analysis supports the use of synthetic datasets as substitutes for real financial data in portfolio analysis and risk simulation, particularly when models are able to capture temporal dynamics. Synthetic data therefore provides a privacy-preserving, cost-effective, and reproducible tool for financial experimentation and model development.
合成金融数据为解决量化金融研究中隐私和可访问性限制提供了实际途径。本文探讨了生成模型(尤其是TimeGAN和变分自编码器(VAEs))在创建支持投资组合构建、交易分析和风险建模的合成回报序列方面的应用。以标普500的历史每日收益为基准,我们生成了符合相似市场条件的合成数据集,并通过统计相似度指标、时间结构测试以及下游金融任务对其进行了评估。 研究表明,TimeGAN能够产生与实际回报观察到的分布形状、波动模式和自相关行为非常接近的合成数据。在均值-方差投资组合优化中应用这些合成数据后,生成的投资组合权重、夏普比率及风险水平仍然与使用真实数据所得的结果相近。相比之下,VAE提供更稳定的训练过程,但倾向于平滑极端市场变动,这影响了风险估计。 最终分析表明,在能够捕捉时间动态特性的模型下,可以将合成数据集作为实际金融数据的替代品用于投资组合分析和风险模拟中。因此,合成数据为金融实验与模型开发提供了隐私保护、成本效益以及可重复使用的工具。
https://arxiv.org/abs/2512.21798
Robust asset allocation is a key challenge in quantitative finance, where deep-learning forecasters often fail due to objective mismatch and error amplification. We introduce the Signature-Informed Transformer (SIT), a novel framework that learns end-to-end allocation policies by directly optimizing a risk-aware financial objective. SIT's core innovations include path signatures for a rich geometric representation of asset dynamics and a signature-augmented attention mechanism embedding financial inductive biases, like lead-lag effects, into the model. Evaluated on daily S\&P 100 equity data, SIT decisively outperforms traditional and deep-learning baselines, especially when compared to predict-then-optimize models. These results indicate that portfolio-aware objectives and geometry-aware inductive biases are essential for risk-aware capital allocation in machine-learning systems. The code is available at: this https URL
稳健的资产配置是量化金融中的一个关键挑战,深度学习预测器常常由于目标不匹配和错误放大而失效。我们引入了签名信息Transformer(SIT),这是一种新颖的框架,通过直接优化风险意识的财务目标来端到端地学习资产配置策略。SIT的核心创新包括用于丰富几何表示资产动态路径签名以及将如领先-滞后效应等金融归纳偏差嵌入模型中的签名增强注意力机制。 在对每日S&P 100股票数据进行评估时,SIT显著优于传统的和基于深度学习的基准方法,尤其是在与预测然后优化模型相比时。这些结果表明,在机器学习系统中,针对投资组合的目标意识以及几何感知的归纳偏差对于风险认知资本配置至关重要。 代码可在以下链接获取:[此链接](this https URL)
https://arxiv.org/abs/2510.03129
Generative modeling of high-frequency limit order book (LOB) dynamics is a critical yet unsolved challenge in quantitative finance, essential for robust market simulation and strategy backtesting. Existing approaches are often constrained by simplifying stochastic assumptions or, in the case of modern deep learning models like Transformers, rely on tokenization schemes that affect the high-precision, numerical nature of financial data through discretization and binning. To address these limitations, we introduce ByteGen, a novel generative model that operates directly on the raw byte streams of LOB events. Our approach treats the problem as an autoregressive next-byte prediction task, for which we design a compact and efficient 32-byte packed binary format to represent market messages without information loss. The core novelty of our work is the complete elimination of feature engineering and tokenization, enabling the model to learn market dynamics from its most fundamental representation. We achieve this by adapting the H-Net architecture, a hybrid Mamba-Transformer model that uses a dynamic chunking mechanism to discover the inherent structure of market messages without predefined rules. Our primary contributions are: 1) the first end-to-end, byte-level framework for LOB modeling; 2) an efficient packed data representation; and 3) a comprehensive evaluation on high-frequency data. Trained on over 34 million events from CME Bitcoin futures, ByteGen successfully reproduces key stylized facts of financial markets, generating realistic price distributions, heavy-tailed returns, and bursty event timing. Our findings demonstrate that learning directly from byte space is a promising and highly flexible paradigm for modeling complex financial systems, achieving competitive performance on standard market quality metrics without the biases of tokenization.
高频限价订单簿(LOB)动态的生成建模是定量金融中的一个关键但尚未解决的挑战,对于稳健的市场模拟和策略回测至关重要。现有的方法通常受限于简化随机假设或依赖于现代深度学习模型(如Transformer)使用的分词方案,这些方案通过离散化和分类影响了金融市场数据高精度数值特性。为了解决这些问题,我们引入了一个新颖的生成模型ByteGen,该模型直接在LOB事件的原始字节流上操作。我们的方法将问题视为一个自回归的下一个字节预测任务,并为此设计了一种紧凑且高效的32字节打包二进制格式来表示市场消息而不丢失信息。我们工作的核心创新在于完全消除了特征工程和分词,使模型能够从最基本的形式中学习市场动态。通过适应H-Net架构(一种混合Mamba-Transformer模型),该模型采用了一种动态切片机制,在没有预定义规则的情况下发现市场的内在结构,从而实现了这一点。 我们的主要贡献包括:1)第一个端到端、字节级的LOB建模框架;2)一种高效的打包数据表示方式;3)在高频数据上的全面评估。ByteGen使用来自CME比特币期货的超过3400万个事件进行训练,并成功再现了金融市场的关键统计特征,生成了现实的价格分布、尾部重的回报以及突发性事件时间间隔。 我们的发现表明,直接从字节空间学习是一种有前景且高度灵活的方法来建模复杂的金融市场系统,在标准市场质量指标上实现了与分词方法相比无偏差的竞争性能。
https://arxiv.org/abs/2508.02247
Synthetic time series are essential tools for data augmentation, stress testing, and algorithmic prototyping in quantitative finance. However, in cryptocurrency markets, characterized by 24/7 trading, extreme volatility, and rapid regime shifts, existing Time Series Generation (TSG) methods and benchmarks often fall short, jeopardizing practical utility. Most prior work (1) targets non-financial or traditional financial domains, (2) focuses narrowly on classification and forecasting while neglecting crypto-specific complexities, and (3) lacks critical financial evaluations, particularly for trading applications. To address these gaps, we introduce \textsf{CTBench}, the first comprehensive TSG benchmark tailored for the cryptocurrency domain. \textsf{CTBench} curates an open-source dataset from 452 tokens and evaluates TSG models across 13 metrics spanning 5 key dimensions: forecasting accuracy, rank fidelity, trading performance, risk assessment, and computational efficiency. A key innovation is a dual-task evaluation framework: (1) the \emph{Predictive Utility} task measures how well synthetic data preserves temporal and cross-sectional patterns for forecasting, while (2) the \emph{Statistical Arbitrage} task assesses whether reconstructed series support mean-reverting signals for trading. We benchmark eight representative models from five methodological families over four distinct market regimes, uncovering trade-offs between statistical fidelity and real-world profitability. Notably, \textsf{CTBench} offers model ranking analysis and actionable guidance for selecting and deploying TSG models in crypto analytics and strategy development.
合成时间序列是数据增强、压力测试和算法原型开发在量化金融中的重要工具。然而,在加密货币市场,其特点是24/7交易、极端波动性和快速的市场变化下,现有的时间序列生成(TSG)方法和基准往往无法满足需求,这削弱了其实用性。大多数之前的工作要么针对非金融或传统金融市场,要么仅聚焦于分类和预测而忽视了加密货币市场的特定复杂性,再者缺乏关键性的财务评估,尤其是对于交易应用的评估。 为了解决这些不足,我们引入了\textsf{CTBench}——首个专门面向加密货币领域的时间序列生成基准。该基准基于来自452种代币的开源数据集,并从13项指标对TSG模型进行评价,涵盖了五个关键维度:预测准确性、排名保真度、交易表现、风险评估和计算效率。其中一项创新在于双任务评估框架: - \emph{Predictive Utility}(预测效用)任务衡量合成数据在保留时间序列和横截面模式方面的效果。 - \emph{Statistical Arbitrage}(统计套利)任务评估重构的时间序列是否支持用于交易的均值回复信号。 我们对来自五种方法学派系的八个代表性模型进行了四个不同市场环境下的基准测试,揭示了统计保真度与现实世界盈利能力之间的权衡。特别地,\textsf{CTBench}提供了模型排名分析和在加密货币分析及策略开发中选择和部署TSG模型的实际指导。
https://arxiv.org/abs/2508.02758
Financial markets pose fundamental challenges for asset return prediction due to their high dimensionality, non-stationarity, and persistent volatility. Despite advances in large language models and multi-agent systems, current quantitative research pipelines suffer from limited automation, weak interpretability, and fragmented coordination across key components such as factor mining and model innovation. In this paper, we propose R&D-Agent for Quantitative Finance, in short RD-Agent(Q), the first data-centric multi-agent framework designed to automate the full-stack research and development of quantitative strategies via coordinated factor-model co-optimization. RD-Agent(Q) decomposes the quant process into two iterative stages: a Research stage that dynamically sets goal-aligned prompts, formulates hypotheses based on domain priors, and maps them to concrete tasks, and a Development stage that employs a code-generation agent, Co-STEER, to implement task-specific code, which is then executed in real-market backtests. The two stages are connected through a feedback stage that thoroughly evaluates experimental outcomes and informs subsequent iterations, with a multi-armed bandit scheduler for adaptive direction selection. Empirically, RD-Agent(Q) achieves up to 2X higher annualized returns than classical factor libraries using 70% fewer factors, and outperforms state-of-the-art deep time-series models on real markets. Its joint factor-model optimization delivers a strong balance between predictive accuracy and strategy robustness. Our code is available at: this https URL.
金融市场在资产回报预测方面提出了根本性的挑战,这些挑战源于市场的高维度、非平稳性和持续的波动性。尽管大型语言模型和多代理系统有所进步,但目前的数量化研究流程仍然存在自动化程度有限、解释能力弱以及关键组成部分(如因子挖掘和模型创新)之间的协调碎片化等问题。在这篇论文中,我们提出了“定量金融研发代理”(R&D-Agent for Quantitative Finance),简称RD-Agent(Q),这是首个以数据为中心的多代理框架,旨在通过协同优化因子-模型来自动完成数量化策略的全流程研究与开发。 RD-Agent(Q)将量化过程分解为两个迭代阶段:**研究阶段(Research stage)**,该阶段动态地设置目标对齐提示、基于领域先验构建假设并将其映射到具体任务;以及 **开发阶段(Development stage)**,这一阶段利用代码生成代理Co-STEER来实现特定的任务代码,并在实际市场回测中执行这些代码。两个阶段通过一个反馈阶段连接起来,在这个阶段里对实验结果进行全面评估,并为后续迭代提供信息,同时使用多臂赌博机调度器进行适应性方向选择。 从经验上讲,RD-Agent(Q)实现了比经典因子库高出2倍的年化回报率,且只用了70%的因素数量。此外,它在实际市场上超过了现有的最先进的深度时间序列模型性能。其联合优化因子-模型的方法能够提供预测准确性和策略稳健性之间的良好平衡。 我们的代码可以在以下链接找到:[此URL](this https URL)。
https://arxiv.org/abs/2505.15155
The stock market, as a cornerstone of the financial markets, places forecasting stock price movements at the forefront of challenges in quantitative finance. Emerging learning-based approaches have made significant progress in capturing the intricate and ever-evolving data patterns of modern markets. With the rapid expansion of the stock market, it presents two characteristics, i.e., stock exogeneity and volatility heterogeneity, that heighten the complexity of price forecasting. Specifically, while stock exogeneity reflects the influence of external market factors on price movements, volatility heterogeneity showcases the varying difficulty in movement forecasting against price fluctuations. In this work, we introduce the framework of Cross-market Synergy with Pseudo-volatility Optimization (CSPO). Specifically, CSPO implements an effective deep neural architecture to leverage external futures knowledge. This enriches stock embeddings with cross-market insights and thus enhances the CSPO's predictive capability. Furthermore, CSPO incorporates pseudo-volatility to model stock-specific forecasting confidence, enabling a dynamic adaptation of its optimization process to improve accuracy and robustness. Our extensive experiments, encompassing industrial evaluation and public benchmarking, highlight CSPO's superior performance over existing methods and effectiveness of all proposed modules contained therein.
股市作为金融市场的重要基石,将预测股价变动视为数量金融领域的主要挑战之一。基于学习的方法在捕捉现代市场复杂且不断变化的数据模式方面取得了显著进展。随着股市的迅速扩张,它呈现出两个特性:即股票外生性和波动性异质性,这增加了价格预测的复杂度。具体而言,股票外生性反映了外部市场因素对股价变动的影响,而波动性异质性则展示了在面对不同价格波动时进行预测难度的不同。 在此研究中,我们提出了跨市场协同伪波动优化(Cross-market Synergy with Pseudo-volatility Optimization, CSPO)框架。具体来说,CSPO 实现了一种有效的深度神经网络架构来利用外部期货知识,这丰富了股票嵌入信息并融入了跨市场的见解,从而增强了 CSPO 的预测能力。此外,CSPO 还采用伪波动率建模特定股票的预测信心水平,使其优化过程能够根据实际情况动态调整以提高准确性和鲁棒性。 我们进行了广泛的实验,包括工业评估和公共基准测试,结果表明与现有方法相比,CSPO 在性能上具有显著优势,并证实了其内部所有模块的有效性。
https://arxiv.org/abs/2503.22740
Reinforcement Learning (RL) has experienced significant advancement over the past decade, prompting a growing interest in applications within finance. This survey critically evaluates 167 publications, exploring diverse RL applications and frameworks in finance. Financial markets, marked by their complexity, multi-agent nature, information asymmetry, and inherent randomness, serve as an intriguing test-bed for RL. Traditional finance offers certain solutions, and RL advances these with a more dynamic approach, incorporating machine learning methods, including transfer learning, meta-learning, and multi-agent solutions. This survey dissects key RL components through the lens of Quantitative Finance. We uncover emerging themes, propose areas for future research, and critique the strengths and weaknesses of existing methods.
强化学习(RL)在过去的十年里取得了显著的进展,这引起了对金融领域应用的浓厚兴趣。这项调查对167篇论文进行了审查,探讨了金融领域中多种RL应用和框架。金融市场以其复杂性、多代理性、信息不对称性和固有随机性而闻名,成为RL的一个有趣的实验平台。传统金融提供了一些解决方案,RL以更动态的方法推动这些解决方案,包括机器学习方法,包括迁移学习、元学习和支持性学习。通过量化金融的视角,我们剖析了RL的关键组成部分。我们发现了新兴的主题,提出了未来的研究方向,并批判了现有方法的优缺点。
https://arxiv.org/abs/2408.10932
Exploring complex adaptive financial trading environments through multi-agent based simulation methods presents an innovative approach within the realm of quantitative finance. Despite the dominance of multi-agent reinforcement learning approaches in financial markets with observable data, there exists a set of systematically significant financial markets that pose challenges due to their partial or obscured data availability. We, therefore, devise a multi-agent simulation approach employing small-scale meta-heuristic methods. This approach aims to represent the opaque bilateral market for Australian government bond trading, capturing the bilateral nature of bank-to-bank trading, also referred to as "over-the-counter" (OTC) trading, and commonly occurring between "market makers". The uniqueness of the bilateral market, characterized by negotiated transactions and a limited number of agents, yields valuable insights for agent-based modelling and quantitative finance. The inherent rigidity of this market structure, which is at odds with the global proliferation of multilateral platforms and the decentralization of finance, underscores the unique insights offered by our agent-based model. We explore the implications of market rigidity on market structure and consider the element of stability, in market design. This extends the ongoing discourse on complex financial trading environments, providing an enhanced understanding of their dynamics and implications.
通过基于多智能体(multi-agent)的仿真方法探索复杂适应金融交易环境是一种在量化金融领域具有创新性的方法。尽管在具有观测数据的市场中,多智能体强化学习方法占据主导地位,但存在一组由于部分或难以获得数据而具有系统性地重要性的金融市场。因此,我们设计了一种基于元启发式方法的多智能体仿真方法。该方法旨在代表澳大利亚政府债券交易的双边市场,捕捉到银行间交易的双边性质,也称为“场外”(OTC) 交易,以及通常在市场制造商之间发生的双边交易。双边市场的独特性,其特点是有协议的交易和有限的代理数量,为基于智能体的建模和量化金融提供了宝贵的见解。市场结构的固有刚性,与其与全球多边平台和金融市场的分散化相矛盾,强调了我们的基于智能体的模型所提供的独特见解。我们探讨了市场刚性对市场结构和市场设计的影响。这扩展了关于复杂金融交易环境的持续讨论,提供了对它们动态和影响的更深入了解。
https://arxiv.org/abs/2405.02849