Reinforcement Learning (RL) has experienced significant advancement over the past decade, prompting a growing interest in applications within finance. This survey critically evaluates 167 publications, exploring diverse RL applications and frameworks in finance. Financial markets, marked by their complexity, multi-agent nature, information asymmetry, and inherent randomness, serve as an intriguing test-bed for RL. Traditional finance offers certain solutions, and RL advances these with a more dynamic approach, incorporating machine learning methods, including transfer learning, meta-learning, and multi-agent solutions. This survey dissects key RL components through the lens of Quantitative Finance. We uncover emerging themes, propose areas for future research, and critique the strengths and weaknesses of existing methods.
强化学习(RL)在过去的十年里取得了显著的进展,这引起了对金融领域应用的浓厚兴趣。这项调查对167篇论文进行了审查,探讨了金融领域中多种RL应用和框架。金融市场以其复杂性、多代理性、信息不对称性和固有随机性而闻名,成为RL的一个有趣的实验平台。传统金融提供了一些解决方案,RL以更动态的方法推动这些解决方案,包括机器学习方法,包括迁移学习、元学习和支持性学习。通过量化金融的视角,我们剖析了RL的关键组成部分。我们发现了新兴的主题,提出了未来的研究方向,并批判了现有方法的优缺点。
https://arxiv.org/abs/2408.10932
Exploring complex adaptive financial trading environments through multi-agent based simulation methods presents an innovative approach within the realm of quantitative finance. Despite the dominance of multi-agent reinforcement learning approaches in financial markets with observable data, there exists a set of systematically significant financial markets that pose challenges due to their partial or obscured data availability. We, therefore, devise a multi-agent simulation approach employing small-scale meta-heuristic methods. This approach aims to represent the opaque bilateral market for Australian government bond trading, capturing the bilateral nature of bank-to-bank trading, also referred to as "over-the-counter" (OTC) trading, and commonly occurring between "market makers". The uniqueness of the bilateral market, characterized by negotiated transactions and a limited number of agents, yields valuable insights for agent-based modelling and quantitative finance. The inherent rigidity of this market structure, which is at odds with the global proliferation of multilateral platforms and the decentralization of finance, underscores the unique insights offered by our agent-based model. We explore the implications of market rigidity on market structure and consider the element of stability, in market design. This extends the ongoing discourse on complex financial trading environments, providing an enhanced understanding of their dynamics and implications.
通过基于多智能体(multi-agent)的仿真方法探索复杂适应金融交易环境是一种在量化金融领域具有创新性的方法。尽管在具有观测数据的市场中,多智能体强化学习方法占据主导地位,但存在一组由于部分或难以获得数据而具有系统性地重要性的金融市场。因此,我们设计了一种基于元启发式方法的多智能体仿真方法。该方法旨在代表澳大利亚政府债券交易的双边市场,捕捉到银行间交易的双边性质,也称为“场外”(OTC) 交易,以及通常在市场制造商之间发生的双边交易。双边市场的独特性,其特点是有协议的交易和有限的代理数量,为基于智能体的建模和量化金融提供了宝贵的见解。市场结构的固有刚性,与其与全球多边平台和金融市场的分散化相矛盾,强调了我们的基于智能体的模型所提供的独特见解。我们探讨了市场刚性对市场结构和市场设计的影响。这扩展了关于复杂金融交易环境的持续讨论,提供了对它们动态和影响的更深入了解。
https://arxiv.org/abs/2405.02849
This research paper delves into the application of Deep Reinforcement Learning (DRL) in asset-class agnostic portfolio optimization, integrating industry-grade methodologies with quantitative finance. At the heart of this integration is our robust framework that not only merges advanced DRL algorithms with modern computational techniques but also emphasizes stringent statistical analysis, software engineering and regulatory compliance. To the best of our knowledge, this is the first study integrating financial Reinforcement Learning with sim-to-real methodologies from robotics and mathematical physics, thus enriching our frameworks and arguments with this unique perspective. Our research culminates with the introduction of AlphaOptimizerNet, a proprietary Reinforcement Learning agent (and corresponding library). Developed from a synthesis of state-of-the-art (SOTA) literature and our unique interdisciplinary methodology, AlphaOptimizerNet demonstrates encouraging risk-return optimization across various asset classes with realistic constraints. These preliminary results underscore the practical efficacy of our frameworks. As the finance sector increasingly gravitates towards advanced algorithmic solutions, our study bridges theoretical advancements with real-world applicability, offering a template for ensuring safety and robust standards in this technologically driven future.
本文深入研究了在资产类别无关的组合优化中应用深度强化学习(DRL)的方法,将行业级别的方法和量化金融相结合。这一整合的核心是我们的稳健框架,不仅将先进的DRL算法与现代计算技术相结合,而且强调了严格的统计分析、软件工程和法规合规性。据我们所知,这是第一个将金融强化学习与机器人学和数学物理中的模拟到现实方法相结合的研究,从而丰富了我们框架和论点的独特视角。我们的研究最后引入了AlphaOptimizerNet,一种专有强化学习代理(相应库)。作为最先进的文献综述和独特跨学科方法的结果,AlphaOptimizerNet在各种资产类别的风险收益优化方面表现出鼓舞人心的效果。这些初步结果强调了我们在框架中的实际有效性。随着金融部门越来越倾向于采用先进的人工智能解决方案,我们的研究将理论进步与现实应用相结合,为在技术驱动的未来确保安全和稳健标准提供了模板。
https://arxiv.org/abs/2403.07916
Recent advancements in large language models (LLMs) have opened new pathways for many domains. However, the full potential of LLMs in financial investments remains largely untapped. There are two main challenges for typical deep learning-based methods for quantitative finance. First, they struggle to fuse textual and numerical information flexibly for stock movement prediction. Second, traditional methods lack clarity and interpretability, which impedes their application in scenarios where the justification for predictions is essential. To solve the above challenges, we propose Ploutos, a novel financial LLM framework that consists of PloutosGen and PloutosGPT. The PloutosGen contains multiple primary experts that can analyze different modal data, such as text and numbers, and provide quantitative strategies from different perspectives. Then PloutosGPT combines their insights and predictions and generates interpretable rationales. To generate accurate and faithful rationales, the training strategy of PloutosGPT leverage rearview-mirror prompting mechanism to guide GPT-4 to generate rationales, and a dynamic token weighting mechanism to finetune LLM by increasing key tokens weight. Extensive experiments show our framework outperforms the state-of-the-art methods on both prediction accuracy and interpretability.
近年来,在大型语言模型(LLMs)领域的发展为许多领域带来了新的途径。然而,LLMs在金融投资领域的全部潜力仍然没有被充分发掘。对于典型的深度学习为基础的量化金融方法,有两种主要挑战。首先,它们在将文本和数值信息灵活融合以进行股票运动预测方面遇到困难。其次,传统方法缺乏清晰度和可解释性,这阻碍了它们在需要预测正当性的场景中的应用。为解决上述挑战,我们提出了Ploutos,一种新型的金融LLM框架,由PloutosGen和PloutosGPT组成。PloutosGen包含多个专家,可以从文本和数值等多种数据形式中分析数据,并提供不同角度的定量策略。然后,PloutosGPT结合它们的见解和预测,生成可解释的合理性。为了生成准确和忠实的合理性,PloutosGPT的训练策略利用了后视镜提示机制来指导GPT-4生成合理性,以及动态词重置机制,通过增加关键单词权重来微调LLM。大量实验证明,我们的框架在预测准确性和可解释性方面都优于最先进的方法。
https://arxiv.org/abs/2403.00782
Deep reinforcement learning (DRL) has revolutionized quantitative finance by achieving excellent performance without significant manual effort. Whereas we observe that the DRL models behave unstably in a dynamic stock market due to the low signal-to-noise ratio nature of the financial data. In this paper, we propose a novel logic-guided trading framework, termed as SYENS (Program Synthesis-based Ensemble Strategy). Different from the previous state-of-the-art ensemble reinforcement learning strategy which arbitrarily selects the best-performing agent for testing based on a single measurement, our framework proposes regularizing the model's behavior in a hierarchical manner using the program synthesis by sketching paradigm. First, we propose a high-level, domain-specific language (DSL) that is used for the depiction of the market environment and action. Then based on the DSL, a novel program sketch is introduced, which embeds human expert knowledge in a logical manner. Finally, based on the program sketch, we adopt the program synthesis by sketching a paradigm and synthesizing a logical, hierarchical trading strategy. We evaluate SYENS on the 30 Dow Jones stocks under the cash trading and the margin trading settings. Experimental results demonstrate that our proposed framework can significantly outperform the baselines with much higher cumulative return and lower maximum drawdown under both settings.
深度强化学习(DRL)通过实现无需大量手动努力的优秀性能,极大地推动了量化金融的发展。然而,我们观察到,由于金融数据信号与噪声比低,动态股票市场中的DRL模型表现不稳定。在本文中,我们提出了一个新颖的基于逻辑的指导交易框架,称为SYENS(基于程序合成的主导策略)。与之前的状态级强化学习策略不同,该框架通过绘制范式对模型的行为进行层次化规范。首先,我们提出了一个高级、领域特定的语言(DSL),用于描述市场环境和动作。然后基于DSL,我们引入了一个新颖的程序草图,以直观地表示人类专家知识。最后,基于程序草图,我们采用基于绘图范式进行程序合成,并合成一个逻辑分层交易策略。我们在现金交易和保证金交易设置下对30只道琼斯股票进行了对SYENS的评估。实验结果表明,与基线相比,我们的框架具有更高的累计回报和较低的最大回撤,尤其是在设置下。
https://arxiv.org/abs/2310.05551
One of the problems in quantitative finance that has received the most attention is the portfolio optimization problem. Regarding its solving, this problem has been approached using different techniques, with those related to quantum computing being especially prolific in recent years. In this study, we present a system called Quantum Computing-based System for Portfolio Optimization with Future Asset Values and Automatic Universe Reduction (Q4FuturePOP), which deals with the Portfolio Optimization Problem considering the following innovations: i) the developed tool is modeled for working with future prediction of assets, instead of historical values; and ii) Q4FuturePOP includes an automatic universe reduction module, which is conceived to intelligently reduce the complexity of the problem. We also introduce a brief discussion about the preliminary performance of the different modules that compose the prototypical version of Q4FuturePOP.
在量化金融中,最受关注的问题之一是投资组合优化问题。关于如何解决这一问题,已经采用了多种技术,与量子计算相关的技术尤为活跃。在本研究中,我们介绍了一个系统,称为基于量子计算的投资组合优化系统,包括未来资产价值自动宇宙减少(Q4FuturePOP)。该系统处理了投资组合优化问题,考虑了以下创新:第一,开发工具是建模用于处理未来资产预测,而不是历史价值;第二,Q4FuturePOP包括一个自动宇宙减少模块,旨在 intelligently 减少问题的复杂性。我们还介绍了关于组成Q4FuturePOP的典型版本不同模块的初步性能的简要讨论。
https://arxiv.org/abs/2309.12627
We present a new financial domain large language model, InvestLM, tuned on LLaMA-65B (Touvron et al., 2023), using a carefully curated instruction dataset related to financial investment. Inspired by less-is-more-for-alignment (Zhou et al., 2023), we manually curate a small yet diverse instruction dataset, covering a wide range of financial related topics, from Chartered Financial Analyst (CFA) exam questions to SEC filings to Stackexchange quantitative finance discussions. InvestLM shows strong capabilities in understanding financial text and provides helpful responses to investment related questions. Financial experts, including hedge fund managers and research analysts, rate InvestLM's response as comparable to those of state-of-the-art commercial models (GPT-3.5, GPT-4 and Claude-2). Zero-shot evaluation on a set of financial NLP benchmarks demonstrates strong generalizability. From a research perspective, this work suggests that a high-quality domain specific LLM can be tuned using a small set of carefully curated instructions on a well-trained foundation model, which is consistent with the Superficial Alignment Hypothesis (Zhou et al., 2023). From a practical perspective, this work develops a state-of-the-art financial domain LLM with superior capability in understanding financial texts and providing helpful investment advice, potentially enhancing the work efficiency of financial professionals. We release the model parameters to the research community.
我们提出了一个新的金融 domain 大型语言模型,InvesLM,通过调整 LLaMA-65B(Touvron等人,2023)上与金融投资相关的精心 curated 指令 dataset 而成。受“少即是多”(Zhou等人,2023)启发,我们手动创建了一份小型但多样化的指令 dataset,涵盖了广泛的金融相关主题,包括CFA 考试问题、SEC 文件、Stackexchange quantitative finance 讨论等。InvesLM 在理解金融文本和回答与投资相关的问题方面表现出强大的能力。金融专家,包括对冲基金经理和研究分析师,将 InvestLM 的回答与最先进的商业模型(GPT-3.5、GPT-4和Claude-2)进行比较。在一项金融 NLP 基准任务的零样本评估中,表现出了强大的通用性。从研究的角度来看,这项工作表明,通过使用一支小型但精心 curated 的指令 dataset 并在受过良好训练的基础模型上调试,可以开发出高质量的金融 domain 特定的 LLM,这与“表面对齐假设”(Zhou等人,2023)是一致的。从实践的角度来看,这项工作开发了最先进的金融 domain LLM,在理解金融文本和提供有用的投资建议方面表现出卓越的能力,可能提高金融专业人士的工作效率。我们将模型参数向研究社区发布。
https://arxiv.org/abs/2309.13064
Order execution is a fundamental task in quantitative finance, aiming at finishing acquisition or liquidation for a number of trading orders of the specific assets. Recent advance in model-free reinforcement learning (RL) provides a data-driven solution to the order execution problem. However, the existing works always optimize execution for an individual order, overlooking the practice that multiple orders are specified to execute simultaneously, resulting in suboptimality and bias. In this paper, we first present a multi-agent RL (MARL) method for multi-order execution considering practical constraints. Specifically, we treat every agent as an individual operator to trade one specific order, while keeping communicating with each other and collaborating for maximizing the overall profits. Nevertheless, the existing MARL algorithms often incorporate communication among agents by exchanging only the information of their partial observations, which is inefficient in complicated financial market. To improve collaboration, we then propose a learnable multi-round communication protocol, for the agents communicating the intended actions with each other and refining accordingly. It is optimized through a novel action value attribution method which is provably consistent with the original learning objective yet more efficient. The experiments on the data from two real-world markets have illustrated superior performance with significantly better collaboration effectiveness achieved by our method.
订单执行是量化金融中的一项基本任务,旨在完成对特定资产的一些交易订单的 acquisition 或 liquidation。最近在无模型强化学习(RL)方面的进展为订单执行问题提供了数据驱动的解决方案。然而,现有的工作总是优化单个订单的执行,忽略了多个订单被指定同时执行的现实情况,导致最优化和偏见。在本文中,我们首先提出了考虑实际约束条件的多Agent RL(MARL)方法,以执行多个订单。具体来说,我们将所有 Agent 视为单个交易员,执行一个特定的订单,同时与其他 Agent 保持沟通和协作,以最大化整体利润。尽管如此,现有的 MARL 算法往往通过仅交换其部分观察信息来集成 Agent 之间的通信,这在复杂的金融市场中效率低下。为了改善协作,我们随后提出了可学习多轮通信协议,以使 Agent 之间相互通信并相应地改进。它通过一种新的行为价值归因方法优化,该方法显然与原始学习目标保持一致,但更高效。从两个实际市场的数据实验可以看出,我们的方法取得了更好的表现,协作效果 significantly better。
https://arxiv.org/abs/2307.03119
One of the most fundamental questions in quantitative finance is the existence of continuous-time diffusion models that fit market prices of a given set of options. Traditionally, one employs a mix of intuition, theoretical and empirical analysis to find models that achieve exact or approximate fits. Our contribution is to show how a suitable game theoretical formulation of this problem can help solve this question by leveraging existing developments in modern deep multi-agent reinforcement learning to search in the space of stochastic processes. More importantly, we hope that our techniques can be leveraged and extended by the community to solve important problems in that field, such as the joint SPX-VIX calibration problem. Our experiments show that we are able to learn local volatility, as well as path-dependence required in the volatility process to minimize the price of a Bermudan option. In one sentence, our algorithm can be seen as a particle method à la Guyon et Henry-Labordere where particles, instead of being designed to ensure $\sigma_{loc}(t,S_t)^2 = \mathbb{E}[\sigma_t^2|S_t]$, are learning RL-driven agents cooperating towards more general calibration targets. This is the first work bridging reinforcement learning with the derivative calibration problem.
https://arxiv.org/abs/2203.06865
Market regimes is a popular topic in quantitative finance even though there is little consensus on the details of how they should be defined. They arise as a feature both in financial market prediction problems and financial market task performing problems. In this work we use discrete event time multi-agent market simulation to freely experiment in a reproducible and understandable environment where regimes can be explicitly switched and enforced. We introduce a novel stochastic process to model the fundamental value perceived by market participants: Continuous-Time Markov Switching Trending Ornstein-Uhlenbeck (CTMSTOU), which facilitates the study of trading policies in regime switching markets. We define the notion of regime-awareness for a trading agent as well and illustrate its importance through the study of different order placement strategies in the context of order execution problems.
https://arxiv.org/abs/2202.00941
We propose a methodology to approximate conditional distributions in the elliptope of correlation matrices based on conditional generative adversarial networks. We illustrate the methodology with an application from quantitative finance: Monte Carlo simulations of correlated returns to compare risk-based portfolio construction methods. Finally, we discuss about current limitations and advocate for further exploration of the elliptope geometry to improve results.
我们提出了一种方法,以基于条件生成对抗网络的条件概率分布逼近关联矩阵的Elliptope。我们使用 quantitative finance 的一个应用来展示这种方法:对相关回报进行蒙特卡罗模拟,比较基于风险的组合构建方法。最后,我们讨论了当前的局限性,并倡导进一步探索Elliptope几何以提高结果。
https://arxiv.org/abs/2107.10606