This paper presents ReasoningRec, a reasoning-based recommendation framework that leverages Large Language Models (LLMs) to bridge the gap between recommendations and human-interpretable explanations. In contrast to conventional recommendation systems that rely on implicit user-item interactions, ReasoningRec employs LLMs to model users and items, focusing on preferences, aversions, and explanatory reasoning. The framework utilizes a larger LLM to generate synthetic explanations for user preferences, subsequently used to fine-tune a smaller LLM for enhanced recommendation accuracy and human-interpretable explanation. Our experimental study investigates the impact of reasoning and contextual information on personalized recommendations, revealing that the quality of contextual and personalized data significantly influences the LLM's capacity to generate plausible explanations. Empirical evaluations demonstrate that ReasoningRec surpasses state-of-the-art methods by up to 12.5\% in recommendation prediction while concurrently providing human-intelligible explanations. The code is available here: this https URL.
本文介绍了ReasoningRec,一个基于推理的推荐框架,该框架利用大型语言模型(LLMs)来弥合推荐与人类可理解解释之间的差距。与依赖于隐式用户-项目交互的传统推荐系统不同,ReasoningRec采用LLMs对用户和项目进行建模,侧重于偏好、厌恶及解释性推理。该框架使用较大的LLM生成用户的合成偏好解释,随后用于微调较小的LLM以提升推荐准确性和人类可理解的解释。我们的实验研究探讨了推理和上下文信息对个性化推荐的影响,表明上下文和个人化数据的质量显著影响LLMs生成合理解释的能力。实证评估显示,ReasoningRec在推荐预测上比最先进的方法高出12.5%,同时提供人类可以理解的解释。代码可在此处获得:此 https URL。
https://arxiv.org/abs/2410.23180
Recent advancements in recommender systems have focused on leveraging Large Language Models (LLMs) to improve user preference modeling, yielding promising outcomes. However, current LLM-based approaches struggle to fully leverage user behavior sequences, resulting in suboptimal preference modeling for personalized recommendations. In this study, we propose a novel Counterfactual Fine-Tuning (CFT) method to address this issue by explicitly emphasizing the role of behavior sequences when generating recommendations. Specifically, we employ counterfactual reasoning to identify the causal effects of behavior sequences on model output and introduce a task that directly fits the ground-truth labels based on these effects, achieving the goal of explicit emphasis. Additionally, we develop a token-level weighting mechanism to adjust the emphasis strength for different item tokens, reflecting the diminishing influence of behavior sequences from earlier to later tokens during predicting an item. Extensive experiments on real-world datasets demonstrate that CFT effectively improves behavior sequence modeling. Our codes are available at this https URL.
近期,推荐系统的研究重点在于利用大型语言模型(LLMs)来改进用户偏好建模,取得了令人鼓舞的成果。然而,当前基于LLM的方法在充分利用用户行为序列方面存在困难,导致个性化推荐中的偏好建模效果不佳。本研究提出了一种新颖的反事实微调(CFT)方法,通过显式强调行为序列的作用来解决这一问题。具体来说,我们采用反事实推理来识别行为序列对模型输出的因果效应,并引入一项任务,直接基于这些效应拟合真实标签,以实现显式的强调效果。此外,我们开发了一种token级别的加权机制,用于调整不同项目token的强调强度,在预测一个项目时反映出从早期到后期的行为序列影响逐渐减弱的现象。在实际数据集上的广泛实验表明,CFT能够有效改进行为序列建模。我们的代码可在以下链接获取:[此 https URL]。
https://arxiv.org/abs/2410.22809
Sequential recommender systems (SRSs) aim to predict the subsequent items which may interest users via comprehensively modeling users' complex preference embedded in the sequence of user-item interactions. However, most of existing SRSs often model users' single low-level preference based on item ID information while ignoring the high-level preference revealed by item attribute information, such as item category. Furthermore, they often utilize limited sequence context information to predict the next item while overlooking richer inter-item semantic relations. To this end, in this paper, we proposed a novel hierarchical preference modeling framework to substantially model the complex low- and high-level preference dynamics for accurate sequential recommendation. Specifically, in the framework, a novel dual-transformer module and a novel dual contrastive learning scheme have been designed to discriminatively learn users' low- and high-level preference and to effectively enhance both low- and high-level preference learning respectively. In addition, a novel semantics-enhanced context embedding module has been devised to generate more informative context embedding for further improving the recommendation performance. Extensive experiments on six real-world datasets have demonstrated both the superiority of our proposed method over the state-of-the-art ones and the rationality of our design.
顺序推荐系统(SRSs)旨在通过全面建模用户项交互序列中嵌入的复杂偏好来预测可能引起用户兴趣的后续项目。然而,大多数现有的SRS通常基于项目ID信息建模用户的单一低级偏好,而忽略了由项目属性信息揭示的高级别偏好,例如项目类别。此外,它们经常利用有限的序列上下文信息来预测下一个项目,而忽视了更丰富的项目间语义关系。为此,在本文中,我们提出了一种新颖的层次化偏好建模框架,以显著地对复杂的低级和高级别偏好动态进行建模,从而实现准确的顺序推荐。具体来说,在该框架中,设计了一种新的双变换器模块以及一种新的双重对比学习方案,分别区别性地学习用户的低级和高级别偏好,并有效增强低级和高级别的偏好学习。此外,还设计了一种新型语义增强上下文嵌入模块,以生成更具信息量的上下文嵌入,进一步提升推荐性能。在六个真实世界数据集上的广泛实验表明了我们提出的方法不仅优于现有的最先进方法,而且验证了我们的设计理念的合理性。
https://arxiv.org/abs/2410.22790
With the growing demand for personalized assortment recommendations, concerns over data privacy have intensified, highlighting the urgent need for effective privacy-preserving strategies. This paper presents a novel framework for privacy-preserving dynamic assortment selection using the multinomial logit (MNL) bandits model. Our approach employs a perturbed upper confidence bound method, integrating calibrated noise into user utility estimates to balance between exploration and exploitation while ensuring robust privacy protection. We rigorously prove that our policy satisfies Joint Differential Privacy (JDP), which better suits dynamic environments than traditional differential privacy, effectively mitigating inference attack risks. This analysis is built upon a novel objective perturbation technique tailored for MNL bandits, which is also of independent interest. Theoretically, we derive a near-optimal regret bound of $\tilde{O}(\sqrt{T})$ for our policy and explicitly quantify how privacy protection impacts regret. Through extensive simulations and an application to the Expedia hotel dataset, we demonstrate substantial performance enhancements over the benchmark method.
随着对个性化商品推荐需求的增加,关于数据隐私的关注也日益增强,这突显了有效保护隐私策略的迫切需要。本文提出了一种基于多项式逻辑(MNL)强盗模型的新型动态商品选择框架,用于实现隐私保护。我们的方法采用了扰动的上置信界法,通过将校准后的噪声整合到用户效用估计中来平衡探索与利用的关系,并确保了强大的隐私保护功能。我们严格证明了我们的策略满足联合微分隐私(JDP),这比传统的微分隐私更适应动态环境,有效缓解了推断攻击的风险。这项分析基于一种专为MNL强盗设计的新目标扰动技术,该技术本身也具有独立的研究价值。理论上,我们得到了我们的策略的接近最优的遗憾界$\tilde{O}(\sqrt{T})$,并明确量化了隐私保护对遗憾的影响。通过广泛的模拟实验和Expedia酒店数据集的应用,我们展示了在基准方法上的显著性能提升。
https://arxiv.org/abs/2410.22488
As more applications of large language models (LLMs) for 3D content for immersive environments emerge, it is crucial to study user behaviour to identify interaction patterns and potential barriers to guide the future design of immersive content creation and editing systems which involve LLMs. In an empirical user study with 12 participants, we combine quantitative usage data with post-experience questionnaire feedback to reveal common interaction patterns and key barriers in LLM-assisted 3D scene editing systems. We identify opportunities for improving natural language interfaces in 3D design tools and propose design recommendations for future LLM-integrated 3D content creation systems. Through an empirical study, we demonstrate that LLM-assisted interactive systems can be used productively in immersive environments.
随着大型语言模型(LLMs)在沉浸式环境中用于创建3D内容的应用越来越多,研究用户行为以识别交互模式和潜在障碍变得至关重要。这可以指导未来涉及LLMs的沉浸式内容创作与编辑系统的设计。在一个包含12名参与者的实证用户研究中,我们将定量使用数据与体验后的问卷反馈相结合,揭示了在LLM辅助的3D场景编辑系统中的常见交互模式及主要障碍。我们发现改进3D设计工具中文本界面的机会,并为未来集成LLMs的3D内容创作系统提出设计建议。通过实证研究,我们展示了LLM辅助的交互系统可以在沉浸式环境中被有效使用。
https://arxiv.org/abs/2410.22177
Path-based explanations provide intrinsic insights into graph-based recommendation models. However, most previous work has focused on explaining an individual recommendation of an item to a user. In this paper, we propose summary explanations, i.e., explanations that highlight why a user or a group of users receive a set of item recommendations and why an item, or a group of items, is recommended to a set of users as an effective means to provide insights into the collective behavior of the recommender. We also present a novel method to summarize explanations using efficient graph algorithms, specifically the Steiner Tree and the Prize-Collecting Steiner Tree. Our approach reduces the size and complexity of summary explanations while preserving essential information, making explanations more comprehensible for users and more useful to model developers. Evaluations across multiple metrics demonstrate that our summaries outperform baseline explanation methods in most scenarios, in a variety of quality aspects.
基于路径的解释为图推荐模型提供了内在的洞察。然而,大多数先前的研究集中于解释对某个用户单独推荐某项物品的原因。本文中,我们提出了汇总解释的概念,即强调为什么一个用户或一组用户会收到一系列物品推荐,以及为什么一项或一组物品会被推荐给一组用户的解释,作为一种有效的方法来揭示推荐系统的集体行为。此外,我们也提出了一种使用高效图算法(特别是Steiner树和带奖赏收集的Steiner树)汇总解释的新方法。我们的方法减少了摘要解释的大小和复杂性,同时保留了关键信息,使得解释对用户更加易懂,也更有利于模型开发者理解。在多个评估指标下的结果显示,在大多数场景下,我们的总结方式在各方面质量上都优于基线解释方法。
https://arxiv.org/abs/2410.22020
Sequential recommendation aims to predict the next item which interests users via modeling their interest in items over time. Most of the existing works on sequential recommendation model users' dynamic interest in specific items while overlooking users' static interest revealed by some static attribute information of items, e.g., category, or brand. Moreover, existing works often only consider the positive excitation of a user's historical interactions on his/her next choice on candidate items while ignoring the commonly existing negative excitation, resulting in insufficient modeling dynamic interest. The overlook of static interest and negative excitation will lead to incomplete interest modeling and thus impede the recommendation performance. To this end, in this paper, we propose modeling both static interest and negative excitation for dynamic interest to further improve the recommendation performance. Accordingly, we design a novel Static-Dynamic Interest Learning (SDIL) framework featured with a novel Temporal Positive and Negative Excitation Modeling (TPNE) module for accurate sequential recommendation. TPNE is specially designed for comprehensively modeling dynamic interest based on temporal positive and negative excitation learning. Extensive experiments on three real-world datasets show that SDIL can effectively capture both static and dynamic interest and outperforms state-of-the-art baselines.
顺序推荐的目标是通过建模用户对项目随时间的兴趣来预测下一个可能感兴趣的项目。现有的大部分工作都集中在模型化用户在特定项目上的动态兴趣,而忽视了某些静态属性信息(如类别或品牌)揭示的用户的静态兴趣。此外,现有研究往往只考虑用户历史交互对候选项目的积极激励作用,忽略了普遍存在的消极激励,导致动态兴趣的建模不充分。忽略静态兴趣和消极激励会导致兴趣模型不完整,进而影响推荐性能。为此,本文提出同时建模静态兴趣与消极激励以进一步提升动态兴趣,并改进推荐效果。我们设计了一个名为“静态-动态兴趣学习”(SDIL)的新框架,该框架具备一个新颖的“时间正负激励建模”(TPNE)模块,用于准确地进行顺序推荐。TPNE 特别是为了基于时间上的正负激励学习全面模型化动态兴趣而设计的。在三个真实世界的数据集上进行了广泛的实验,结果显示 SDIL 能够有效捕捉静态和动态兴趣,并超越了最先进的基线方法。
https://arxiv.org/abs/2410.22013
Recent advancements in diffusion models have shown promising results in sequential recommendation (SR). However, current diffusion-based methods still exhibit two key limitations. First, they implicitly model the diffusion process for target item embeddings rather than the discrete target item itself, leading to inconsistency in the recommendation process. Second, existing methods rely on either implicit or explicit conditional diffusion models, limiting their ability to fully capture the context of user behavior and leading to less robust target item embeddings. In this paper, we propose the Dual Conditional Diffusion Models for Sequential Recommendation (DCRec), introducing a discrete-to-continuous sequential recommendation diffusion framework. Our framework introduces a complete Markov chain to model the transition from the reversed target item representation to the discrete item index, bridging the discrete and continuous item spaces for diffusion models and ensuring consistency with the diffusion framework. Building on this framework, we present the Dual Conditional Diffusion Transformer (DCDT) that incorporates the implicit conditional and the explicit conditional for diffusion-based SR. Extensive experiments on public benchmark datasets demonstrate that DCRec outperforms state-of-the-art methods.
近期,扩散模型在序列推荐(SR)方面的进展显示出有希望的结果。然而,当前基于扩散的方法仍然存在两个关键限制。首先,它们对目标项目嵌入进行隐式建模,而不是直接针对离散的目标项目本身,导致了推荐过程的一致性问题。其次,现有方法依赖于隐式或显式的条件扩散模型,这限制了它们捕捉用户行为上下文的能力,并导致目标项目的嵌入不够鲁棒。在本文中,我们提出了用于序列推荐的双条件扩散模型(DCRec),引入了一个离散到连续的序列推荐扩散框架。我们的框架通过引入一个完整的马尔可夫链来模拟从逆向目标项目表示到离散项目索引的转换过程,将离散和连续项目空间连接起来,并确保与扩散框架的一致性。基于这一框架,我们提出了双条件扩散变换器(DCDT),该模型结合了隐式条件和显式条件,用于基于扩散的SR。在公共基准数据集上的广泛实验表明,DCRec优于现有最先进的方法。
https://arxiv.org/abs/2410.21967
Public Code Review (PCR) is an assistant to the internal code review of the development team, in the form of a public Software Question Answering (SQA) community, to help developers access high-quality and efficient review services. Current methods on PCR mainly focus on the reviewer's perspective, including finding a capable reviewer, predicting comment quality, and recommending/generating review comments. However, it is not well studied that how to satisfy the review necessity requests posted by developers which can increase their visibility, which in turn acts as a prerequisite for better review responses. To this end, we propose a Knowledge-guided Prompt learning for Public Code Review (KP-PCR) to achieve developer-based code review request quality assurance (i.e., predicting request necessity and recommending tags subtask). Specifically, we reformulate the two subtasks via 1) text prompt tuning which converts both of them into a Masked Language Model (MLM) by constructing prompt templates using hard prompt; 2) knowledge and code prefix tuning which introduces external knowledge by soft prompt, and uses data flow diagrams to characterize code snippets. Finally, both of the request necessity prediction and tag recommendation subtasks output predicted results through an answer engineering module. In addition, we further analysis the time complexity of our KP-PCR that has lightweight prefix based the operation of introducing knowledge. Experimental results on the PCR dataset for the period 2011-2023 demonstrate that our KP-PCR outperforms baselines by 8.3%-28.8% in the request necessity prediction and by 0.1%-29.5% in the tag recommendation. The code implementation is released at this https URL.
公共代码审查(Public Code Review,简称PCR)是开发团队内部代码审查的助手,以公开的软件问答(SQA)社区的形式存在,旨在帮助开发者获取高质量和高效的审查服务。目前关于PCR的方法主要集中在审阅者的视角上,包括寻找有能力的审阅者、预测评论质量以及推荐/生成审查评论。然而,如何满足开发人员发布的审查需求请求并提高其可见性,从而作为获得更好审查响应的前提条件,这一方面研究得并不充分。为此,我们提出了一种基于知识引导提示学习的公共代码审查(KP-PCR),以实现开发者基础的代码审查请求质量保证(即预测请求必要性和推荐标签子任务)。具体来说,我们将这两个子任务重新定义为:1)通过硬提示构造提示模板进行文本提示调优,将它们都转化为掩码语言模型(MLM);2)通过软提示引入外部知识,并使用数据流图来表征代码片段的前缀调优。最终,请求必要性预测和标签推荐子任务通过一个答案工程模块输出预测结果。此外,我们还进一步分析了KP-PCR的时间复杂度,基于轻量级前缀的操作来引入知识。在2011年至2023年的PCR数据集上的实验结果显示,我们的KP-PCR在请求必要性预测上比基线高出8.3%-28.8%,在标签推荐上高出0.1%-29.5%。代码实现发布在此链接:[https URL]。
https://arxiv.org/abs/2410.21673
Many platforms, such as e-commerce websites, offer both search and recommendation services simultaneously to better meet users' diverse needs. Recommendation services suggest items based on user preferences, while search services allow users to search for items before providing recommendations. Since users and items are often shared between the search and recommendation domains, there is a valuable opportunity to enhance the recommendation domain by leveraging user preferences extracted from the search domain. Existing approaches either overlook the shift in user intention between these domains or fail to capture the significant impact of learning from users' search queries on understanding their interests. In this paper, we propose a framework that learns from user search query embeddings within the context of user preferences in the recommendation domain. Specifically, user search query sequences from the search domain are used to predict the items users will click at the next time point in the recommendation domain. Additionally, the relationship between queries and items is explored through contrastive learning. To address issues of data sparsity, the diffusion model is incorporated to infer positive items the user will select after searching with certain queries in a denoising manner, which is particularly effective in preventing false positives. Effectively extracting this information, the queries are integrated into click-through rate prediction in the recommendation domain. Experimental analysis demonstrates that our model outperforms state-of-the-art models in the recommendation domain.
许多平台,例如电子商务网站,同时提供搜索和推荐服务以更好地满足用户多样的需求。推荐服务根据用户的偏好提出商品建议,而搜索服务允许用户在提供建议之前搜索商品。由于用户和商品经常在搜索域和推荐域之间共享,因此有很好的机会通过利用从搜索域提取的用户偏好来增强推荐领域。现有方法要么忽略了这些领域之间的用户意图变化,要么未能捕捉到学习用户的搜索查询对理解其兴趣的重要影响。在这篇论文中,我们提出了一种框架,该框架在推荐领域的用户偏好的背景下学习用户的搜索查询嵌入。具体来说,从搜索域收集的用户搜索查询序列用于预测用户在未来时间点将在推荐领域点击的商品。此外,通过对比学习探索了查询和商品之间的关系。为了解决数据稀疏性问题,引入扩散模型以去噪的方式推断出用户在使用特定查询进行搜索后将选择的正向商品,这尤其有效地防止了假阳性情况的发生。有效提取这些信息后,查询被整合进推荐领域的点击率预测中。实验分析表明,我们的模型在推荐领域优于最先进的模型。
https://arxiv.org/abs/2410.21487
We introduce a multi-turn benchmark for evaluating personalised alignment in LLM-based AI assistants, focusing on their ability to handle user-provided safety-critical contexts. Our assessment of ten leading models across five scenarios (each with 337 use cases) reveals systematic inconsistencies in maintaining user-specific consideration, with even top-rated "harmless" models making recommendations that should be recognised as obviously harmful to the user given the context provided. Key failure modes include inappropriate weighing of conflicting preferences, sycophancy (prioritising user preferences above safety), a lack of attentiveness to critical user information within the context window, and inconsistent application of user-specific knowledge. The same systematic biases were observed in OpenAI's o1, suggesting that strong reasoning capacities do not necessarily transfer to this kind of personalised thinking. We find that prompting LLMs to consider safety-critical context significantly improves performance, unlike a generic 'harmless and helpful' instruction. Based on these findings, we propose research directions for embedding self-reflection capabilities, online user modelling, and dynamic risk assessment in AI assistants. Our work emphasises the need for nuanced, context-aware approaches to alignment in systems designed for persistent human interaction, aiding the development of safe and considerate AI assistants.
我们引入了一个多轮基准测试,用于评估基于LLM的AI助手在处理用户提供的关键安全上下文方面是否达到个性化对齐。通过五个场景(每个场景包含337个用例)对十个领先模型进行评估后发现,在保持用户特定考量方面存在系统性不一致问题,即使是被评为“无害”的顶级模型,在给定用户提供的情境下也提出了明显有害的建议。主要失败模式包括冲突偏好权重不当、谄媚(优先考虑用户的偏好而非安全性)、忽视上下文窗口中的关键用户信息,以及在应用用户特定知识时的一致性缺失。同样的系统偏差也在OpenAI的o1中被观察到,这表明强大的推理能力并不一定能够转化为这种个性化的思考方式。我们发现,提示LLM考虑安全关键情境显著提高了性能,而通用的“无害且有用”指令则没有这样的效果。基于这些发现,我们提出了研究方向,包括将自我反思能力、在线用户建模和动态风险评估嵌入到AI助手中的建议。我们的工作强调了在设计用于持续人类交互的系统时需要采用细致入微、情境感知的方法来实现对齐,这有助于开发安全且体贴的AI助手。
https://arxiv.org/abs/2410.21159
GPRec explicitly categorizes users into groups in a learnable manner and aligns them with corresponding group embeddings. We design the dual group embedding space to offer a diverse perspective on group preferences by contrasting positive and negative patterns. On the individual level, GPRec identifies personal preferences from ID-like features and refines the obtained individual representations to be independent of group ones, thereby providing a robust complement to the group-level modeling. We also present various strategies for the flexible integration of GPRec into various DRS models. Rigorous testing of GPRec on three public datasets has demonstrated significant improvements in recommendation quality.
GPRec 以可学习的方式明确地将用户分类为不同的组,并将这些组与相应的组嵌入对齐。我们设计了双重组嵌入空间,通过对比正负模式来提供多元化的组偏好视角。在个体层面,GPRec 能够从类似ID的特征中识别个人偏好,并优化获得的个体表示,使其独立于组表示,从而为组级别建模提供强大的补充。此外,我们还提出了一些灵活地将 GPRec 整合到各种DRS模型中的策略。通过对三个公开数据集进行严格的测试,GPRec 显示出了显著提高推荐质量的效果。
https://arxiv.org/abs/2410.20730
In recent years, the widespread adoption of Large Language Models (LLMs) has sparked interest in their potential for application within the military domain. However, the current generation of LLMs demonstrate sub-optimal performance on Army use cases, due to the prevalence of domain-specific vocabulary and jargon. In order to fully leverage LLMs in-domain, many organizations have turned to fine-tuning to circumvent the prohibitive costs involved in training new LLMs from scratch. In light of this trend, we explore the viability of adapting open-source LLMs for usage in the Army domain in order to address their existing lack of domain-specificity. Our investigations have resulted in the creation of three distinct generations of TRACLM, a family of LLMs fine-tuned by The Research and Analysis Center (TRAC), Army Futures Command (AFC). Through continuous refinement of our training pipeline, each successive iteration of TRACLM displayed improved capabilities when applied to Army tasks and use cases. Furthermore, throughout our fine-tuning experiments, we recognized the need for an evaluation framework that objectively quantifies the Army domain-specific knowledge of LLMs. To address this, we developed MilBench, an extensible software framework that efficiently evaluates the Army knowledge of a given LLM using tasks derived from doctrine and assessments. We share preliminary results, models, methods, and recommendations on the creation of TRACLM and MilBench. Our work significantly informs the development of LLM technology across the DoD and augments senior leader decisions with respect to artificial intelligence integration.
近年来,大型语言模型(LLMs)的广泛应用引起了人们对其在军事领域应用潜力的兴趣。然而,当前一代的LLMs在军队使用案例中表现出次优性能,主要是由于存在特定领域的词汇和专业术语。为了充分利用LLMs,许多组织转向微调来规避从头开始训练新LLM的巨大成本。鉴于这一趋势,我们探讨了将开源LLMs适应于军队领域使用的可行性,以解决其现有的领域专属性不足问题。我们的研究结果导致了TRACLM的创建,这是由陆军未来司令部(AFC)的研究和分析中心(TRAC)微调的一系列LLM家族。通过持续优化训练管道,每个后续版本的TRACLM在应用于军队任务和使用案例时都显示出了改进的能力。此外,在我们的微调实验中,我们认识到需要一个客观量化LLMs领域专属性知识的评估框架。为此,我们开发了MilBench,这是一个可扩展的软件框架,可以高效地通过基于教义和评估的任务来评测给定LLM的军队知识。我们分享了关于TRACLM和MilBench创建的初步结果、模型、方法以及建议。我们的工作显著推动了整个国防部范围内LLM技术的发展,并增强高级领导人在人工智能整合方面的决策能力。
https://arxiv.org/abs/2410.20297
Large-scale recommendation models are currently the dominant workload for many large Internet companies. These recommenders are characterized by massive embedding tables that are sparsely accessed by the index for user and item features. The size of these 1TB+ tables imposes a severe memory bottleneck for the training and inference of recommendation models. In this work, we propose a novel recommendation framework that is small, powerful, and efficient to run and train, based on the state-of-the-art Deep Learning Recommendation Model (DLRM). The proposed framework makes inference more efficient on the cloud servers, explores the possibility of deploying powerful recommenders on smaller edge devices, and optimizes the workload of the communication overhead in distributed training under the data parallelism settings. Specifically, we show that quantization-aware training (QAT) can impose a strong regularization effect to mitigate the severe overfitting issues suffered by DLRMs. Consequently, we achieved INT4 quantization of DLRM models without any accuracy drop. We further propose two techniques that improve and accelerate the conventional QAT workload specifically for the embedding tables in the recommendation models. Furthermore, to achieve efficient training, we quantize the gradients of the embedding tables into INT8 on top of the well-supported specified sparsification. We show that combining gradient sparsification and quantization together significantly reduces the amount of communication. Briefly, DQRM models with INT4 can achieve 79.07% accuracy on Kaggle with 0.27 GB model size, and 81.21% accuracy on the Terabyte dataset with 1.57 GB, which even outperform FP32 DLRMs that have much larger model sizes (2.16 GB on Kaggle and 12.58 on Terabyte).
大规模推荐模型目前是许多大型互联网公司主要的工作负载。这些推荐器的特点是由索引稀疏访问的用户和项目特征的大规模嵌入表。这些超过1TB大小的表对推荐模型的训练和推理造成了严重的内存瓶颈。在这项工作中,我们提出了一种新颖的推荐框架,它基于最先进的深度学习推荐模型(DLRM),小而强大且运行与训练高效。所提出的框架使云服务器上的推断更加高效,探索了在较小边缘设备上部署强大推荐器的可能性,并优化了数据并行设置下分布式训练中的通信开销工作负载。具体来说,我们展示了量化感知训练(QAT)可以施加强有力的正则化效果以缓解DLRM遭受的严重过拟合问题。因此,我们在不降低任何准确性的前提下实现了DLRM模型的INT4量化。此外,我们提出了两种改进和加速传统QAT工作负载的技术,特别是针对推荐模型中的嵌入表。为了实现高效的训练,我们在支持良好的指定稀疏化的基础上将嵌入表的梯度量化为INT8。我们证明了结合梯度稀疏化与量化可以显著减少通信量。简而言之,具有INT4的DQRM模型在Kaggle上可达到79.07%的准确率和0.27GB的模型大小,并且在Terabyte数据集上达到81.21%的准确率和1.57GB的模型大小。这些性能甚至超过了具有更大模型尺寸(Kaggle为2.16 GB,Terabyte为12.58 GB)的FP32 DLRMs。
https://arxiv.org/abs/2410.20046
Agents powered by large language models have shown remarkable reasoning and execution capabilities, attracting researchers to explore their potential in the recommendation domain. Previous studies have primarily focused on enhancing the capabilities of either recommendation agents or user agents independently, but have not considered the interaction and collaboration between recommendation agents and user agents. To address this gap, we propose a novel framework named FLOW, which achieves collaboration between the recommendation agent and the user agent by introducing a feedback loop. Specifically, the recommendation agent refines its understanding of the user's preferences by analyzing the user agent's feedback on previously suggested items, while the user agent leverages suggested items to uncover deeper insights into the user's latent interests. This iterative refinement process enhances the reasoning capabilities of both the recommendation agent and the user agent, enabling more precise recommendations and a more accurate simulation of user behavior. To demonstrate the effectiveness of the feedback loop, we evaluate both recommendation performance and user simulation performance on three widely used recommendation domain datasets. The experimental results indicate that the feedback loop can simultaneously improve the performance of both the recommendation and user agents.
大型语言模型驱动的代理展现出了卓越的推理和执行能力,吸引了研究人员探索其在推荐领域的潜力。先前的研究主要集中在独立提升推荐代理或用户代理的能力上,但没有考虑推荐代理与用户代理之间的交互和协作。为了解决这一差距,我们提出了一种名为FLOW的新框架,通过引入反馈循环实现推荐代理和用户代理间的协作。具体来说,推荐代理通过对之前建议项目的用户代理反馈进行分析来精化对用户偏好的理解;同时,用户代理利用被建议的项目来发现用户潜在兴趣的更深层次见解。这种迭代优化过程增强了推荐代理和用户代理的推理能力,使推荐更加精准,并能更准确地模拟用户行为。为了展示反馈循环的有效性,我们在三个广泛使用的推荐领域数据集上评估了推荐性能和用户模拟性能。实验结果表明,该反馈循环能够同时提升推荐代理和用户代理的表现。
https://arxiv.org/abs/2410.20027
This paper proposes a machine learning approach for classifying classical and new Egyptian music by composer and generating new similar music. The proposed system utilizes a convolutional neural network (CNN) for classification and a CNN autoencoder for generation. The dataset used in this project consists of new and classical Egyptian music pieces composed by different composers. To classify the music by composer, each sample is normalized and transformed into a mel spectrogram. The CNN model is trained on the dataset using the mel spectrograms as input features and the composer labels as output classes. The model achieves 81.4\% accuracy in classifying the music by composer, demonstrating the effectiveness of the proposed approach. To generate new music similar to the original pieces, a CNN autoencoder is trained on a similar dataset. The model is trained to encode the mel spectrograms of the original pieces into a lower-dimensional latent space and then decode them back into the original mel spectrogram. The generated music is produced by sampling from the latent space and decoding the samples back into mel spectrograms, which are then transformed into audio. In conclusion, the proposed system provides a promising approach to classifying and generating classical Egyptian music, which can be applied in various musical applications, such as music recommendation systems, music production, and music education.
本文提出了一种利用机器学习方法对经典和现代埃及音乐进行作曲家分类并生成相似新音乐的方法。该系统采用卷积神经网络(CNN)进行分类,并使用CNN自编码器进行生成。该项目所用的数据集包含不同作曲家创作的经典与现代埃及音乐作品。为了按作曲家对音乐进行分类,每个样本都经过归一化处理,并转换为梅尔频谱图。该CNN模型以梅尔频谱图为输入特征,作曲家标签为输出类别,在数据集上进行了训练。实验结果表明,该模型在按作曲家分类音乐时的准确率为81.4%,证明了所提出方法的有效性。为了生成与原作品相似的新音乐,我们还训练了一个类似的CNN自编码器模型。该模型被训练用于将原始作品的梅尔频谱图编码到一个低维潜在空间中,并将其解码回原始梅尔频谱图。新生成的音乐是通过对潜在空间进行采样并将其反向转换为梅尔频谱图,最后再转化为音频文件而获得的。综上所述,所提出的系统为埃及古典音乐的分类和生成提供了一种有前景的方法,该方法可以应用于各种音乐应用中,如音乐推荐系统、音乐制作及音乐教育等场景。
https://arxiv.org/abs/2410.19719
Language agents have recently been used to simulate human behavior and user-item interactions for recommendation systems. However, current language agent simulations do not understand the relationships between users and items, leading to inaccurate user profiles and ineffective recommendations. In this work, we explore the utility of Knowledge Graphs (KGs), which contain extensive and reliable relationships between users and items, for recommendation. Our key insight is that the paths in a KG can capture complex relationships between users and items, eliciting the underlying reasons for user preferences and enriching user profiles. Leveraging this insight, we propose Knowledge Graph Enhanced Language Agents(KGLA), a framework that unifies language agents and KG for recommendation systems. In the simulated recommendation scenario, we position the user and item within the KG and integrate KG paths as natural language descriptions into the simulation. This allows language agents to interact with each other and discover sufficient rationale behind their interactions, making the simulation more accurate and aligned with real-world cases, thus improving recommendation performance. Our experimental results show that KGLA significantly improves recommendation performance (with a 33%-95% boost in NDCG@1 among three widely used benchmarks) compared to the previous best baseline method.
最近,语言代理被用于模拟推荐系统中的人类行为和用户-项目互动。然而,当前的语言代理仿真未能理解用户与项目之间的关系,导致用户画像不准确且推荐效果不佳。在此研究中,我们探索了知识图谱(KGs)在推荐中的效用,这些知识图谱包含广泛且可靠地描述用户与项目之间关系的信息。我们的关键见解是,知识图谱中的路径能够捕捉到用户与项目之间的复杂关系,揭示用户偏好的潜在原因,并丰富用户画像。基于这一洞见,我们提出了Knowledge Graph Enhanced Language Agents (KGLA),一个统一语言代理和KG用于推荐系统的框架。在模拟推荐场景中,我们将用户和项目置于知识图谱内,并将KG路径作为自然语言描述整合到仿真中。这使得语言代理能够彼此互动并发现其互动背后的充分理由,从而使仿真更准确,更贴近现实情况,进而提高推荐性能。我们的实验结果显示,与之前的最佳基准方法相比,KGLA显著提高了推荐性能(在三个广泛使用的基准上,NDCG@1的提升幅度为33%-95%)。
https://arxiv.org/abs/2410.19627
This paper explores the application of prompt engineering to enhance the performance of large language models (LLMs) in the domain of Traditional Chinese Medicine (TCM). We propose TCM-Prompt, a framework that integrates various pre-trained language models (PLMs), templates, tokenization, and verbalization methods, allowing researchers to easily construct and fine-tune models for specific TCM-related tasks. We conducted experiments on disease classification, syndrome identification, herbal medicine recommendation, and general NLP tasks, demonstrating the effectiveness and superiority of our approach compared to baseline methods. Our findings suggest that prompt engineering is a promising technique for improving the performance of LLMs in specialized domains like TCM, with potential applications in digitalization, modernization, and personalized medicine.
本文探讨了将提示工程应用于提高大型语言模型(LLMs)在中医药(TCM)领域性能的应用。我们提出了TCM-Prompt框架,该框架整合了各种预训练语言模型(PLMs)、模板、分词和具体化方法,使研究人员能够轻松构建并调整用于特定中医药相关任务的模型。我们在疾病分类、症候识别、草药推荐以及一般自然语言处理任务上进行了实验,结果展示了我们方法的有效性和优越性,相较于基线方法更有优势。我们的研究发现表明,在中医药等专业领域中,提示工程是一种有望提高大型语言模型性能的技术,并且在数字化、现代化和个性化医疗方面具有潜在应用价值。
https://arxiv.org/abs/2410.19451
Traditional recommender systems rely on high-dimensional (latent) embeddings for modeling user-item interactions, often resulting in opaque representations that lack interpretability. Moreover, these systems offer limited control to users over their recommendations. Inspired by recent work, we introduce TExtuAl Representations for Scrutable recommendations (TEARS) to address these challenges. Instead of representing a user's interests through a latent embedding, TEARS encodes them in natural text, providing transparency and allowing users to edit them. To do so, TEARS uses a modern LLM to generate user summaries based on user preferences. We find the summaries capture user preferences uniquely. Using these summaries, we take a hybrid approach where we use an optimal transport procedure to align the summaries' representation with the learned representation of a standard VAE for collaborative filtering. We find this approach can surpass the performance of three popular VAE models while providing user-controllable recommendations. We also analyze the controllability of TEARS through three simulated user tasks to evaluate the effectiveness of a user editing its summary.
传统的推荐系统依赖于高维(潜在)嵌入来建模用户-项目交互,通常会导致缺乏可解释性的不透明表示。此外,这些系统在用户的推荐控制方面提供的选项有限。受到最近工作的启发,我们引入了文本化可理解推荐表示(TEARS),以解决这些问题。与通过潜在嵌入表示用户兴趣不同,TEARS将其编码为自然语言文本,提供了透明度,并允许用户对其进行编辑。为此,TEARS使用现代大型语言模型根据用户偏好生成用户摘要。我们发现这些摘要能独特地捕捉用户的偏好。利用这些摘要,我们采取了一种混合方法,即通过最优传输程序将摘要的表示与用于协同过滤的标准变分自编码器(VAE)所学习到的表示对齐。我们发现这种方法在性能上可以超越三种流行的 VAE 模型,并且能够提供用户可控的推荐。此外,我们还通过三个模拟用户任务分析了 TEARS 的可控制性,以评估用户编辑其摘要的有效性。
https://arxiv.org/abs/2410.19302
Companies, including market rivals, have long collaborated on the development of open source software (OSS), resulting in a tangle of co-operation and competition known as "open source co-opetition". While prior work investigates open source co-opetition in OSS projects that are hosted by vendor-neutral foundations, we have a limited understanding thereof in OSS projects that are hosted and governed by one company. Given their prevalence, it is timely to investigate open source co-opetition in such contexts. Towards this end, we conduct a mixed-methods analysis of three company-hosted OSS projects in the artificial intelligence (AI) industry: Meta's PyTorch (prior to its donation to the Linux Foundation), Google's TensorFlow, and Hugging Face's Transformers. We contribute three key findings. First, while the projects exhibit similar code authorship patterns between host and external companies (80%/20% of commits), collaborations are structured differently (e.g., decentralised vs. hub-and-spoke networks). Second, host and external companies engage in strategic, non-strategic, and contractual collaborations, with varying incentives and collaboration practices. Some of the observed collaborations are specific to the AI industry (e.g., hardware-software optimizations or AI model integrations), while others are typical of the broader software industry (e.g., bug fixing or task outsourcing). Third, single-vendor governance creates a power imbalance that influences open source co-opetition practices and possibilities, from the host company's singular decision-making power (e.g., the risk of license change) to their community involvement strategy (e.g., from over-control to over-delegation). We conclude with recommendations for future research.
企业,包括市场竞争对手,长期以来在开源软件(OSS)的开发上进行了合作,形成了一个被称为“开源共竞”(open source co-opetition)的合作与竞争交织的局面。虽然之前的研究调查了由中立基金会托管的OSS项目中的这种“开源共竞”,但我们对单一公司托管和管理的OSS项目中的这种情况了解甚少。鉴于这些项目的普遍性,是时候在这样的背景下研究开源共竞了。为此,我们采用混合方法分析了人工智能(AI)行业中三个企业托管的OSS项目:Meta的PyTorch(捐赠给Linux基金会之前)、Google的TensorFlow和Hugging Face的Transformers。我们的研究贡献了三项关键发现。 首先,尽管这些项目的代码作者模式在托管公司与外部公司之间相似(80%/20% 的提交量),但合作结构有所不同(例如,去中心化网络与枢纽和辐条式网络)。其次,托管公司和外部公司在战略、非战略及合同基础上进行合作,其激励机制和协作实践各不相同。其中一些观察到的合作关系特别针对AI行业(如硬件-软件优化或AI模型集成),而另一些则属于更广泛的软件行业的常见做法(如修复错误或外包任务)。第三,单一供应商的治理模式造成了权力失衡,这影响了开源共竞的做法和可能性,从托管公司唯一的决策权(例如,更改许可证的风险)到其社区参与策略(例如,从过度控制转向过度委托)。 我们最后提出了一些对未来研究的建议。
https://arxiv.org/abs/2410.18241