Abstract
Online advertising is ubiquitous in web business. Image displaying is considered as one of the most commonly used formats to interact with customers. Contextual multi-armed bandit has shown success in the application of advertising to solve the exploration-exploitation dilemma existed in the recommendation procedure. Inspired by the visual-aware advertising, in this paper, we propose a contextual bandit algorithm, where the convolutional neural network (CNN) is utilized to learn the reward function along with an upper confidence bound (UCB) for exploration. We also prove a near-optimal regret bound $\tilde{\mathcal{O}}(\sqrt{T})$ when the network is over-parameterized and establish strong connections with convolutional neural tangent kernel (CNTK). Finally, we evaluate the empirical performance of the proposed algorithm and show that it outperforms other state-of-the-art UCB-based bandit algorithms on real-world image data sets.
Abstract (translated)
URL
https://arxiv.org/abs/2107.07438