Abstract
User preference prediction requires a comprehensive and accurate understanding of individual tastes. This includes both surface-level attributes, such as color and style, and deeper content-related aspects, such as themes and composition. However, existing methods typically rely on general human preferences or assume static user profiles, often neglecting individual variability and the dynamic, multifaceted nature of personal taste. To address these limitations, we propose an approach built upon Multimodal Large Language Models, introducing contrastive preference loss and preference tokens to learn personalized user preferences from historical interactions. The contrastive preference loss is designed to effectively distinguish between user ''likes'' and ''dislikes'', while the learnable preference tokens capture shared interest representations among existing users, enabling the model to activate group-specific preferences and enhance consistency across similar users. Extensive experiments demonstrate our model outperforms other methods in preference prediction accuracy, effectively identifying users with similar aesthetic inclinations and providing more precise guidance for generating images that align with individual tastes. The project page is \texttt{this https URL}.
Abstract (translated)
用户偏好预测需要对个人品味进行全面而准确的理解,这包括表层属性(如颜色和风格)以及更深层次的内容相关方面(如主题和构图)。然而,现有的方法通常依赖于一般的人类偏好或假设静态的用户档案,往往忽视了个体差异和个人喜好的动态、多面性。为了解决这些限制,我们提出了一种基于多模态大型语言模型的方法,并引入对比偏好损失和偏好标记来从历史交互中学习个性化的用户偏好。对比偏好损失旨在有效地区分用户的“喜欢”和“不喜欢”,而可学习的偏好标记则捕获现有用户之间的共享兴趣表示,使模型能够激活特定群体的偏好并增强类似用户的一致性。广泛的实验表明,我们的模型在偏好预测准确性方面优于其他方法,可以有效地识别具有相似审美倾向的用户,并为生成与个人品味相符的图像提供更精确的指导。 项目页面位于:\[此URL\]
URL
https://arxiv.org/abs/2508.08220