Abstract
Click-through-rate (CTR) prediction plays an important role in online advertising and ad recommender systems. In the past decade, maximizing CTR has been the main focus of model development and solution creation. Therefore, researchers and practitioners have proposed various models and solutions to enhance the effectiveness of CTR prediction. Most of the existing literature focuses on capturing either implicit or explicit feature interactions. Although implicit interactions are successfully captured in some studies, explicit interactions present a challenge for achieving high CTR by extracting both low-order and high-order feature interactions. Unnecessary and irrelevant features may cause high computational time and low prediction performance. Furthermore, certain features may perform well with specific predictive models while underperforming with others. Also, feature distribution may fluctuate due to traffic variations. Most importantly, in live production environments, resources are limited, and the time for inference is just as crucial as training time. Because of all these reasons, feature selection is one of the most important factors in enhancing CTR prediction model performance. Simple filter-based feature selection algorithms do not perform well and they are not sufficient. An effective and efficient feature selection algorithm is needed to consistently filter the most useful features during live CTR prediction process. In this paper, we propose a heuristic algorithm named Neighborhood Search with Heuristic-based Feature Selection (NeSHFS) to enhance CTR prediction performance while reducing dimensionality and training time costs. We conduct comprehensive experiments on three public datasets to validate the efficiency and effectiveness of our proposed solution.
Abstract (translated)
点击率(CTR)预测在在线广告和广告推荐系统中起着重要作用。在过去的十年里,最大化CTR一直是最主要的模型发展和解决方案创建的重点。因此,研究人员和实践者提出了各种模型和解决方案来提高CTR预测的有效性。大部分现有文献关注于捕捉隐含或显含特征之间的相互作用。尽管在某些研究中隐含相互作用成功地被捕捉到,但通过提取低阶和高阶特征交互来获得高CTR仍然具有挑战性。不必要的和不相关的特征可能会导致高计算时间和低预测性能。此外,某些特征可能会在特定的预测模型上表现出色,而在其他模型上表现不佳。另外,由于流量变化,特征分布可能会波动。在实时生产环境中,资源有限,推理时间同样至关重要。由于以上原因,特征选择是增强CTR预测模型性能的最重要因素之一。简单的滤波器基础特征选择算法表现不佳,而且它们并不足够有效。需要一种有效且高效的特征选择算法来在实时CTR预测过程中持续过滤最有用的特征。在本文中,我们提出了名为Neighborhood Search with Heuristic-based Feature Selection(NeSHFS)的启发式算法,以提高CTR预测性能并降低维度和训练时间成本。我们对三个公共数据集进行了全面的实验,以验证我们提出的解决方案的有效性和有效性。
URL
https://arxiv.org/abs/2409.08703