Abstract
Extracting signals through alpha factor mining is a fundamental challenge in quantitative finance. Existing automated methods primarily follow two paradigms: Decoupled Factor Generation, which treats factor discovery as isolated events, and Iterative Factor Evolution, which focuses on local parent-child refinements. However, both paradigms lack a global structural view, often treating factor pools as unstructured collections or fragmented chains, which leads to redundant search and limited diversity. To address these limitations, we introduce AlphaPROBE (Alpha Mining via Principled Retrieval and On-graph Biased Evolution), a framework that reframes alpha mining as the strategic navigation of a Directed Acyclic Graph (DAG). By modeling factors as nodes and evolutionary links as edges, AlphaPROBE treats the factor pool as a dynamic, interconnected ecosystem. The framework consists of two core components: a Bayesian Factor Retriever that identifies high-potential seeds by balancing exploitation and exploration through a posterior probability model, and a DAG-aware Factor Generator that leverages the full ancestral trace of factors to produce context-aware, nonredundant optimizations. Extensive experiments on three major Chinese stock market datasets against 8 competitive baselines demonstrate that AlphaPROBE significantly gains enhanced performance in predictive accuracy, return stability and training efficiency. Our results confirm that leveraging global evolutionary topology is essential for efficient and robust automated alpha discovery. We have open-sourced our implementation at this https URL.
Abstract (translated)
通过阿尔法因子挖掘提取信号是量化金融中的一个基本挑战。现有的自动化方法主要遵循两种范式:解耦因素生成,这种方法将因子发现视为孤立事件;以及迭代因素进化,侧重于局部的父子层次细化。然而,这两种范式都缺乏全局结构视角,往往将因子池视作无结构集合或碎片化链条,导致冗余搜索和多样性受限。 为了克服这些限制,我们引入了AlphaPROBE(通过原则性检索和图上偏置演化进行阿尔法挖掘),这是一个框架,它重新定义阿尔法挖掘为有向无环图(DAG)的战略导航。AlphaPROBE将因子视为节点,并将进化链接视作边,从而将因子池视为一个动态的、相互关联的生态系统。该框架由两个核心组件组成:贝叶斯因子检索器,通过后验概率模型平衡利用和探索来识别高潜力种子;以及DAG感知型因子生成器,它利用因素的完整先祖追踪以产生上下文相关且非冗余优化。 在三个主要中国股票市场数据集上进行的大量实验表明,AlphaPROBE相较于8个竞争基线,在预测准确性、收益稳定性和训练效率方面显著提升了性能。我们的研究结果证实了借助全局进化拓扑对于有效和鲁棒自动阿尔法发现的重要性。 我们已经开源了此实现,请访问[此处](https://URL)查看。
URL
https://arxiv.org/abs/2602.11917