Abstract
In this paper, we present OmniSearchSage, a versatile and scalable system for understanding search queries, pins, and products for Pinterest search. We jointly learn a unified query embedding coupled with pin and product embeddings, leading to an improvement of $>8\%$ relevance, $>7\%$ engagement, and $>5\%$ ads CTR in Pinterest's production search system. The main contributors to these gains are improved content understanding, better multi-task learning, and real-time serving. We enrich our entity representations using diverse text derived from image captions from a generative LLM, historical engagement, and user-curated boards. Our multitask learning setup produces a single search query embedding in the same space as pin and product embeddings and compatible with pre-existing pin and product embeddings. We show the value of each feature through ablation studies, and show the effectiveness of a unified model compared to standalone counterparts. Finally, we share how these embeddings have been deployed across the Pinterest search stack, from retrieval to ranking, scaling to serve $300k$ requests per second at low latency. Our implementation of this work is available at this https URL.
Abstract (translated)
在本文中,我们提出了OmniSearchSage,一个用于理解Pinterest搜索查询、列表和产品的多功能且可扩展的系统。我们共同学习了一个统一的查询嵌入与Pin和产品嵌入相结合,导致了在Pinterest的生成搜索系统中的相关性提高8%以上,参与度提高7%以上,广告CTR提高5%以上。这些增长的主要贡献是改进的内容理解、更好的多任务学习和实时 serving。我们通过从图像标题中提取多样性的文本来丰富实体表示,包括历史参与度和用户创建的列表。我们的多任务学习设置在相同的空间中产生了与Pin和产品嵌入兼容的单个搜索查询嵌入,并支持预先存在的Pin和产品嵌入。我们通过消融研究来展示每个功能的价值,并比较统一的模型与分离的模型的效果。最后,我们分享了这些嵌入如何应用于Pinterest搜索链的各个环节,从检索到排名,以及在低延迟下每秒服务300k请求的效果。本文中我们所做的工作的实现版本请点击以下链接获取:https://www.aclweb.org/anthology/W19-6231
URL
https://arxiv.org/abs/2404.16260