In the e-commerce domain, the accurate extraction of attribute-value pairs from product listings (e.g., Brand: Apple) is crucial for enhancing search and recommendation systems. The automation of this extraction process is challenging due to the vast diversity of product categories and their respective attributes, compounded by the lack of extensive, accurately annotated training datasets and the demand for low latency to meet the real-time needs of e-commerce platforms. To address these challenges, we introduce GenToC, a novel two-stage model for extracting attribute-value pairs from product titles. GenToC is designed to train with partially-labeled data, leveraging incomplete attribute-value pairs and obviating the need for a fully annotated dataset. Moreover, we introduce a bootstrapping method that enables GenToC to progressively refine and expand its training dataset. This enhancement substantially improves the quality of data available for training other neural network models that are typically faster but are inherently less capable than GenToC in terms of their capacity to handle partially-labeled data. By supplying an enriched dataset for training, GenToC significantly advances the performance of these alternative models, making them more suitable for real-time deployment. Our results highlight the unique capability of GenToC to learn from a limited set of labeled data and to contribute to the training of more efficient models, marking a significant leap forward in the automated extraction of attribute-value pairs from product titles. GenToC has been successfully integrated into India's largest B2B e-commerce platform, this http URL, achieving a significant increase of 21.1% in recall over the existing deployed system while maintaining a high precision of 89.5% in this challenging task.
在电子商务领域,从产品列表中准确提取属性值对(例如,品牌:苹果)对于提高搜索和推荐系统非常重要。自动化这个过程因产品类别的多样性和它们的相应属性而变得非常具有挑战性,再加上缺乏广泛、准确注释的训练数据以及满足电子商务平台实时需求的需求,使得该过程非常具有挑战性。为了应对这些挑战,我们引入了GenToC,一种用于从产品标题中提取属性值对的新颖两阶段模型。GenToC旨在利用部分标注数据进行训练,并利用不完整的属性值对省略了完全标注的数据集的需求。此外,我们还引入了一种通过bootstrap方法逐步优化和扩展训练数据集的 bootstrapping 方法。通过为训练提供丰富的数据,GenToC显著提高了这些替代模型的性能,使它们更适合用于处理部分标注数据。我们的结果突出了GenToC从有限标注数据中学习的独特能力,以及为更有效的模型提供培训的重要性的显著进展,这标志着自动提取产品标题中属性值对的自动化过程中向前迈进了一大步。GenToC 已经成功地将印度最大的 B2B电子商务平台集成到该http URL中,实现了超过现有部署系统21.1%的召回率,同时保持了89.5%的高精确度,在具有挑战性的任务中表现出色。
https://arxiv.org/abs/2405.10918
This review aims to systematically assess the current status and prospects of artificial intelligence (AI) in the rehabilitation management of patients with schizophrenia and their impact on the rehabilitation process. We selected 70 studies from 2012 to the present, focusing on application, technology categories, products, and data types of machine learning, deep learning, reinforcement learning, and other technologies in mental health interventions and management. The results indicate that AI can be widely used in symptom monitoring, relapse risk prediction, and rehabilitation treatment by analyzing ecological momentary assessment, behavioral, and speech data. This review further explores the potential challenges and future directions of emerging products, technologies, and analytical methods based on AI, such as social media analysis, serious games, and large language models in rehabilitation. In summary, this study systematically reviews the application status of AI in schizophrenia rehabilitation management and provides valuable insights and recommendations for future research paths.
本研究旨在系统地评估人工智能(AI)在精神分裂症患者康复管理中的现状和前景,以及其对康复过程的影响。我们选中了2012年至2019年间发表的70篇研究,重点关注机器学习、深度学习、强化学习和其他技术在精神卫生干预和管理中的应用、技术类别、产品和数据类型。研究结果表明,AI可以通过分析生态瞬时评估、行为和语音数据,在症状监测、复发风险预测和康复治疗中得到广泛应用。本研究还深入探讨了基于AI的新兴产品、技术和分析方法,如社交媒体分析、严重游戏和大语言模型的康复应用,为未来研究提供了宝贵的见解和方向。总之,本研究系统地评价了AI在精神分裂症康复管理中的应用状况,为未来研究提供了宝贵的见解和方向。
https://arxiv.org/abs/2405.10883
One way to personalize chatbot interactions is by establishing common ground with the intended reader. A domain where establishing mutual understanding could be particularly impactful is vaccine concerns and misinformation. Vaccine interventions are forms of messaging which aim to answer concerns expressed about vaccination. Tailoring responses in this domain is difficult, since opinions often have seemingly little ideological overlap. We define the task of tailoring vaccine interventions to a Common-Ground Opinion (CGO). Tailoring responses to a CGO involves meaningfully improving the answer by relating it to an opinion or belief the reader holds. In this paper we introduce TAILOR-CGO, a dataset for evaluating how well responses are tailored to provided CGOs. We benchmark several major LLMs on this task; finding GPT-4-Turbo performs significantly better than others. We also build automatic evaluation metrics, including an efficient and accurate BERT model that outperforms finetuned LLMs, investigate how to successfully tailor vaccine messaging to CGOs, and provide actionable recommendations from this investigation. Code and model weights: this https URL Dataset: this https URL
一种个性化聊天机器人互动的方法是与目标读者建立共同点。在疫苗担忧和错误信息方面,建立相互理解可能尤为重要。疫苗干预是一种旨在回答关于接种疫苗的担忧的信息传递形式。在这样一个领域定制回答很难,因为观点往往有很大的意识形态差异。我们定义了将疫苗干预定制到共同观点(CGO)的任务。将回答定制到CGO涉及有意义地改进答案,将其与读者持有的观点或信念相关联。在本文中,我们介绍了TAILOR-CGO数据集,用于评估响应是否充分定制到提供的CGO。我们在该任务上基准了几个主要LLM;发现GPT-4-Turbo的表现优于其他模型。我们还构建了自动评估指标,包括一个高效且准确的BERT模型,该模型超过了微调的LLM。我们研究了如何成功地将疫苗信息定制到CGO,并从这项调查中提供了可行的建议。代码和模型权重:<https://this URL> 数据集:<https://this URL>
https://arxiv.org/abs/2405.10861
Model evaluations are central to understanding the safety, risks, and societal impacts of AI systems. While most real-world AI applications involve human-AI interaction, most current evaluations (e.g., common benchmarks) of AI models do not. Instead, they incorporate human factors in limited ways, assessing the safety of models in isolation, thereby falling short of capturing the complexity of human-model interactions. In this paper, we discuss and operationalize a definition of an emerging category of evaluations -- "human interaction evaluations" (HIEs) -- which focus on the assessment of human-model interactions or the process and the outcomes of humans using models. First, we argue that HIEs can be used to increase the validity of safety evaluations, assess direct human impact and interaction-specific harms, and guide future assessments of models' societal impact. Second, we propose a safety-focused HIE design framework -- containing a human-LLM interaction taxonomy -- with three stages: (1) identifying the risk or harm area, (2) characterizing the use context, and (3) choosing the evaluation parameters. Third, we apply our framework to two potential evaluations for overreliance and persuasion risks. Finally, we conclude with tangible recommendations for addressing concerns over costs, replicability, and unrepresentativeness of HIEs.
模型评估是理解人工智能系统的安全性、风险和社会影响的关键。虽然大多数现实世界的AI应用涉及人类-AI交互,但大多数现有AI模型的评估(例如常见基准)并没有包括人类因素。相反,它们以有限的方式包括人类因素,从而在评估模型安全性时存在不足,未能捕捉到人类-模型交互的复杂性。在本文中,我们讨论并操作了一个新兴类别的定义——"人类交互评估"(HIEs)——关注评估人类-模型交互或使用模型的人类使用过程及其结果。我们首先认为,HIEs可用于提高安全性评估的有效性,评估直接人类影响和交互特定的危害,并指导未来评估模型的社会影响。其次,我们提出了一个以安全性为重点的HIE设计框架——包含一个人类LLM交互分类器——包括三个阶段:1)确定风险或危害区域,2)描述使用背景,3)选择评估参数。第三,我们将我们的框架应用于两个潜在的评估:过度依赖和说服风险。最后,我们得出关于解决HIE成本、可重复性和代表性问题的具体建议。
https://arxiv.org/abs/2405.10632
Large language model (LLM)-based recommender models that bridge users and items through textual prompts for effective semantic reasoning have gained considerable attention. However, few methods consider the underlying rationales behind interactions, such as user preferences and item attributes, limiting the reasoning capability of LLMs for recommendations. This paper proposes a rationale distillation recommender (RDRec), a compact model designed to learn rationales generated by a larger language model (LM). By leveraging rationales from reviews related to users and items, RDRec remarkably specifies their profiles for recommendations. Experiments show that RDRec achieves state-of-the-art (SOTA) performance in both top-N and sequential recommendations. Our source code is released at this https URL.
基于大型语言模型(LLM)的推荐模型通过文本提示用户和物品,实现了有效的语义推理,已经引起了广泛关注。然而,很少有方法考虑用户偏好和物品属性背后的合理性,从而限制了LLM在推荐中的推理能力。本文提出了一种理据蒸馏推荐器(RDRec)模型,一种紧凑型模型,用于学习由较大语言模型(LM)生成的合理性。通过利用与用户和物品相关的评论中的理据,RDRec显著地指定了它们的推荐个人档案。实验表明,RDRec在 top-N 和序列推荐方面都取得了最先进的(SOTA)性能。我们的源代码已发布在上述链接处。
https://arxiv.org/abs/2405.10587
The expansion of streaming media and e-commerce has led to a boom in recommendation systems, including Sequential recommendation systems, which consider the user's previous interactions with items. In recent years, research has focused on architectural improvements such as transformer blocks and feature extraction that can augment model information. Among these features are context and attributes. Of particular importance is the temporal footprint, which is often considered part of the context and seen in previous publications as interchangeable with positional information. Other publications use positional encodings with little attention to them. In this paper, we analyse positional encodings, showing that they provide relative information between items that are not inferable from the temporal footprint. Furthermore, we evaluate different encodings and how they affect metrics and stability using Amazon datasets. We added some new encodings to help with these problems along the way. We found that we can reach new state-of-the-art results by finding the correct positional encoding, but more importantly, certain encodings stabilise the training.
流媒体和电子商务的扩张导致推荐系统出现爆炸式增长,包括序列推荐系统,这些系统考虑用户与项目之前的交互。近年来,研究主要集中在增强模型信息的架构改进上,如Transformer模块和特征提取。这些特征包括上下文和属性。其中最重要的是时间足迹,它通常被认为属于上下文,并且在之前的出版物中被认为是与位置信息可交换的。其他出版物对它们的使用也没有引起太多关注。在本文中,我们分析了位置编码,表明它们在无法从时间足迹推断出物品之间的相对信息。此外,我们使用亚马逊数据集评估了不同的编码方式和它们对指标和稳定性造成的影响。在寻找正确位置编码以达到新的最佳状态之前,我们添加了一些新的编码器来帮助解决这些问题。我们发现,通过找到正确的位置编码,我们可以达到最先进的状态,但更重要的是,某些编码器会稳定训练。
https://arxiv.org/abs/2405.10436
Existing strategies for managing risks from advanced AI systems often focus on affecting what AI systems are developed and how they diffuse. However, this approach becomes less feasible as the number of developers of advanced AI grows, and impedes beneficial use-cases as well as harmful ones. In response, we urge a complementary approach: increasing societal adaptation to advanced AI, that is, reducing the expected negative impacts from a given level of diffusion of a given AI capability. We introduce a conceptual framework which helps identify adaptive interventions that avoid, defend against and remedy potentially harmful uses of AI systems, illustrated with examples in election manipulation, cyberterrorism, and loss of control to AI decision-makers. We discuss a three-step cycle that society can implement to adapt to AI. Increasing society's ability to implement this cycle builds its resilience to advanced AI. We conclude with concrete recommendations for governments, industry, and third-parties.
目前,管理先进人工智能系统风险的策略通常集中于影响AI系统的发展和扩散。然而,这种方法在AI开发者数量不断增加的情况下变得越来越不现实,也会阻碍有益的和有害的使用案例。因此,我们呼吁一种互补的方法:增加社会对先进AI的适应性,即减少从给定扩散水平开始,给定AI能力的潜在有害影响。我们引入了一个概念框架,以帮助识别避免、防御和修复可能对AI系统产生有害影响的适应干预措施,并通过选举操纵、网络恐怖主义和失去对AI决策者的控制等例子进行了说明。我们讨论了社会可以采用的三步循环来适应AI。不断提高社会实施这一循环能力增强了其对先进AI的韧性。最后,我们给出了政府、行业和第三方的具体建议。
https://arxiv.org/abs/2405.10295
Recent years have witnessed the rapid development of short videos, which usually contain both visual and audio modalities. Background music is important to the short videos, which can significantly influence the emotions of the viewers. However, at present, the background music of short videos is generally chosen by the video producer, and there is a lack of automatic music recommendation methods for short videos. This paper introduces MVBind, an innovative Music-Video embedding space Binding model for cross-modal retrieval. MVBind operates as a self-supervised approach, acquiring inherent knowledge of intermodal relationships directly from data, without the need of manual annotations. Additionally, to compensate the lack of a corresponding musical-visual pair dataset for short videos, we construct a dataset, SVM-10K(Short Video with Music-10K), which mainly consists of meticulously selected short videos. On this dataset, MVBind manifests significantly improved performance compared to other baseline methods. The constructed dataset and code will be released to facilitate future research.
近年来,短视频的发展速度加快,通常包含视觉和音频模态。背景音乐对于短视频来说非常重要,因为它可以显著影响观众的情感。然而,目前,短视频的背景音乐通常由视频制作人选择,缺乏针对短视频的自动音乐推荐方法。本文介绍了一种创新的跨模态检索模型MVBind,用于短片的跨模态检索。MVBind是一种自监督方法,通过直接从数据中获取模态关系的固有知识,无需手动注释。此外,为了弥补短视频没有相应的音乐-视觉对数据集的不足,我们构建了一个数据集SVM-10K(短视频与音乐-10K),主要包含精心选择短的短视频。在这个数据集上,MVBind表现出比其他基线方法显著的优异性能。构建的数据集和代码将发布,以促进未来研究。
https://arxiv.org/abs/2405.09286
Traditional recommendation proposals, including content-based and collaborative filtering, usually focus on similarity between items or users. Existing approaches lack ways of introducing unexpectedness into recommendations, prioritizing globally popular items over exposing users to unforeseen items. This investigation aims to design and evaluate a novel layer on top of recommender systems suited to incorporate relational information and suggest items with a user-defined degree of surprise. We propose a Knowledge Graph (KG) based recommender system by encoding user interactions on item catalogs. Our study explores whether network-level metrics on KGs can influence the degree of surprise in recommendations. We hypothesize that surprisingness correlates with certain network metrics, treating user profiles as subgraphs within a larger catalog KG. The achieved solution reranks recommendations based on their impact on structural graph metrics. Our research contributes to optimizing recommendations to reflect the metrics. We experimentally evaluate our approach on two datasets of LastFM listening histories and synthetic Netflix viewing profiles. We find that reranking items based on complex network metrics leads to a more unexpected and surprising composition of recommendation lists.
传统推荐策略,包括基于内容和协同过滤的方法,通常关注物品或用户之间的相似性。现有方法缺乏将意外性引入推荐的方法,将全局热门物品优先于向用户展示未知的物品。本研究旨在设计并评估一种新层,将关系信息编码在物品目录上,用于在推荐系统中建议具有用户定义程度惊喜的物品。我们提出了一个基于知识图谱的推荐系统,通过编码用户在目录上的交互来实现。我们的研究探讨了网络级指标在知识图谱上的影响是否会影响推荐中的惊喜程度。我们假设惊喜程度与某些网络指标相关,将用户个人档案视为大型目录知识图谱中的子图。所实现的结果根据其对结构图指标的影响对推荐进行排序。我们的研究为优化推荐以反映这些指标做出了贡献。我们在LastFM听书历史数据集和合成Netflix观看个人资料数据集上进行了实验评估。我们发现,根据复杂的网络指标重新排列物品会导致推荐列表更加意外和令人惊讶。
https://arxiv.org/abs/2405.08465
Factorization-based models have gained popularity since the Netflix challenge {(2007)}. Since that, various factorization-based models have been developed and these models have been proven to be efficient in predicting users' ratings towards items. A major concern is that explaining the recommendations generated by such methods is non-trivial because the explicit meaning of the latent factors they learn are not always clear. In response, we propose a novel model that combines factorization-based methods with argumentation frameworks (AFs). The integration of AFs provides clear meaning at each stage of the model, enabling it to produce easily understandable explanations for its recommendations. In this model, for every user-item interaction, an AF is defined in which the features of items are considered as arguments, and the users' ratings towards these features determine the strength and polarity of these arguments. This perspective allows our model to treat feature attribution as a structured argumentation procedure, where each calculation is marked with explicit meaning, enhancing its inherent interpretability. Additionally, our framework seamlessly incorporates side information, such as user contexts, leading to more accurate predictions. We anticipate at least three practical applications for our model: creating explanation templates, providing interactive explanations, and generating contrastive explanations. Through testing on real-world datasets, we have found that our model, along with its variants, not only surpasses existing argumentation-based methods but also competes effectively with current context-free and context-aware methods.
自从Netflix挑战赛(2007年)以来,基于因子分解的模型已经取得了很大成功。此后,已经开发了许多基于因子分解的模型,并证明这些模型在预测用户对物品的评分方面非常有效。一个主要问题是,解释这些方法生成的推荐是非常困难的,因为它们学习到的隐含因素的明确意义并不总是清晰的。为了应对这个问题,我们提出了一个新模型,将因子分解方法与论证框架(AFs)相结合。AFs在模型每个阶段都提供了明确的含义,使得模型能够轻松地生产易于理解的解释。在这个模型中,对于每个用户-项目交互,定义一个AF,其中物品的特征被视为论据,用户对这些特征的评分决定了这些论据的强度和极性。这种观点使得我们的模型将特征归因视为一个结构化的论证过程,其中每个计算都带有明确的含义,增强了其固有的可解释性。此外,我们的框架无缝地融入了附加信息,如用户上下文,从而提高了预测的准确性。我们预计,我们的模型及其变体至少有以下三个实际应用:创建解释模板、提供交互式解释和生成对比性解释。通过在现实世界数据集上的测试,我们发现,与现有基于论证的方法相比,我们的模型及其变体超过了它们,并且与当前的无上下文和上下文感知方法竞争相当。
https://arxiv.org/abs/2405.08131
The novel coronavirus (COVID-19), a highly infectious respiratory disease caused by the SARS-CoV-2 has emerged as an unprecedented healthcare crisis. The pandemic had a devastating impact on the health, well-being, and economy of the global population. Early screening and diagnosis of symptomatic patients plays crucial role in isolation of patient to help stop community transmission as well as providing early treatment helping in reducing the mortality rate. Although, the RT-PCR test is the gold standard for COVID-19 testing, it is a manual, laborious, time consuming, uncomfortable, and invasive process. Due to its accessibility, availability, lower-cost, ease of sanitisation, and portable setup, chest X-Ray imaging can serve as an effective screening and diagnostic tool. In this study, we first highlight limitations of existing datasets and studies in terms of data quality, data imbalance, and evaluation strategy. Second, we curated a large-scale COVID-19 chest X-ray dataset from many publicly available COVID-19 imaging databases and proposed a pre-processing pipeline to improve quality of the dataset. We proposed CoVScreen, an CNN architecture to train and test the curated dataset. The experimental results applying different classification scenarios on the curated dataset in terms of various evaluation metrics demonstrate the effectiveness of proposed methodology in the screening of COVID-19 infection.
新型冠状病毒(COVID-19),由SARS-CoV-2引起的高度传染性呼吸疾病,已成为前所未有的卫生危机。大流行对全球人口的健康、福祉和经济都造成了毁灭性影响。在症状性患者早期筛查和诊断在隔离患者以帮助阻止社区传播以及提供早期治疗以降低死亡率方面起着关键作用。尽管COVID-19测试的RT-PCR测试是金标准,但它是一个手动、费力、耗时、不舒适和侵入性的过程。由于其可获取性、可用性、较低成本、易消毒和便携式设置,胸部X光影像可以成为有效的筛查和诊断工具。在本研究中,我们首先强调了现有数据和研究的局限性,即数据质量、数据不平衡和评估策略。然后,我们从多个公开可用的COVID-19成像数据库中收集了大规模COVID-19胸X光数据,并提出了一种预处理方案以提高数据质量。我们提出了COVScreen,一种CNN架构,用于训练和测试所选数据的预处理后的数据。通过对所选数据集的不同分类情景在各种评估指标上的实验结果,表明所提出的方法在筛查COVID-19感染方面非常有效。
https://arxiv.org/abs/2405.07674
Last year has witnessed the considerable interest of Large Language Models (LLMs) for their potential applications in recommender systems, which may mitigate the persistent issue of data sparsity. Though large efforts have been made for user-item graph augmentation with better graph-based recommendation performance, they may fail to deal with the dynamic graph recommendation task, which involves both structural and temporal graph dynamics with inherent complexity in processing time-evolving data. To bridge this gap, in this paper, we propose a novel framework, called DynLLM, to deal with the dynamic graph recommendation task with LLMs. Specifically, DynLLM harnesses the power of LLMs to generate multi-faceted user profiles based on the rich textual features of historical purchase records, including crowd segments, personal interests, preferred categories, and favored brands, which in turn supplement and enrich the underlying relationships between users and items. Along this line, to fuse the multi-faceted profiles with temporal graph embedding, we engage LLMs to derive corresponding profile embeddings, and further employ a distilled attention mechanism to refine the LLM-generated profile embeddings for alleviating noisy signals, while also assessing and adjusting the relevance of each distilled facet embedding for seamless integration with temporal graph embedding from continuous time dynamic graphs (CTDGs). Extensive experiments on two real e-commerce datasets have validated the superior improvements of DynLLM over a wide range of state-of-the-art baseline methods.
去年,大型语言模型(LLMs)对推荐系统潜在应用的兴趣浓厚,这可能减轻数据稀疏性问题。尽管在用户-物品图增强方面已经做出了很大努力,以提高基于图的推荐性能,但他们可能无法应对动态图推荐任务,该任务涉及具有内在复杂性的时间演化数据和结构化图动态。为了填补这一空白,在本文中,我们提出了一个名为DynLLM的新框架,用于处理带有LLMs的动态图推荐任务。具体来说,DynLLM利用LLMs的功率,根据历史购买记录的丰富文本特征生成多方面用户概况,包括人群细分、个人兴趣、偏好类别和喜欢品牌,从而补充和丰富用户与物品之间的底层关系。沿着这一思路,将多方面概况与时间图嵌入相融合,我们使用LLMs计算相应的概况嵌入,并进一步采用去噪关注机制对LLM生成的概况嵌入进行优化,同时评估和调整每个去噪关注方面的相关性,以实现与连续时间动态图(CTDGs)的平滑集成。在两个真实电子商务数据集上的大量实验证实,DynLLM在各种最先进的基线方法上具有优越的改进效果。
https://arxiv.org/abs/2405.07580
Conversational Recommender System (CRS) leverages real-time feedback from users to dynamically model their preferences, thereby enhancing the system's ability to provide personalized recommendations and improving the overall user experience. CRS has demonstrated significant promise, prompting researchers to concentrate their efforts on developing user simulators that are both more realistic and trustworthy. The emergence of Large Language Models (LLMs) has marked the onset of a new epoch in computational capabilities, exhibiting human-level intelligence in various tasks. Research efforts have been made to utilize LLMs for building user simulators to evaluate the performance of CRS. Although these efforts showcase innovation, they are accompanied by certain limitations. In this work, we introduce a Controllable, Scalable, and Human-Involved (CSHI) simulator framework that manages the behavior of user simulators across various stages via a plugin manager. CSHI customizes the simulation of user behavior and interactions to provide a more lifelike and convincing user interaction experience. Through experiments and case studies in two conversational recommendation scenarios, we show that our framework can adapt to a variety of conversational recommendation settings and effectively simulate users' personalized preferences. Consequently, our simulator is able to generate feedback that closely mirrors that of real users. This facilitates a reliable assessment of existing CRS studies and promotes the creation of high-quality conversational recommendation datasets.
对话推荐系统(CRS)利用用户的实时反馈来动态建模他们的喜好,从而增强系统提供个性化推荐的能力,提高用户体验。CRS已经表现出巨大的潜力,促使研究人员将精力集中在开发更真实和可靠的用户模拟器上。大型语言模型的出现标志着计算能力的到一个新纪元,展示了人类水平智能在各种任务中的应用。为了利用LLMs构建用户模拟器以评估CRS的表现,研究人员进行了努力。尽管这些努力展示了创新,但它们附带有一些局限性。在这篇工作中,我们介绍了一个可控制、可扩展且涉及人类参与的(CSHI)模拟器框架,它通过插件管理器来管理用户模拟器的行为。CSHI定制了用户行为的模拟,提供了更真实和令人信服的用户交互体验。通过两个对话推荐场景的实验和案例研究,我们证明了我们的框架可以适应各种对话推荐设置,并有效模拟用户的个性化喜好。因此,我们的模拟器能够生成与真实用户非常相似的反馈。这有助于对现有CRS研究进行可靠的评估,并促进高质量对话推荐数据集的创建。
https://arxiv.org/abs/2405.08035
This study explores the application of recurrent neural networks to recognize emotions conveyed in music, aiming to enhance music recommendation systems and support therapeutic interventions by tailoring music to fit listeners' emotional states. We utilize Russell's Emotion Quadrant to categorize music into four distinct emotional regions and develop models capable of accurately predicting these categories. Our approach involves extracting a comprehensive set of audio features using Librosa and applying various recurrent neural network architectures, including standard RNNs, Bidirectional RNNs, and Long Short-Term Memory (LSTM) networks. Initial experiments are conducted using a dataset of 900 audio clips, labeled according to the emotional quadrants. We compare the performance of our neural network models against a set of baseline classifiers and analyze their effectiveness in capturing the temporal dynamics inherent in musical expression. The results indicate that simpler RNN architectures may perform comparably or even superiorly to more complex models, particularly in smaller datasets. We've also applied the following experiments on larger datasets: one is augmented based on our original dataset, and the other is from other sources. This research not only enhances our understanding of the emotional impact of music but also demonstrates the potential of neural networks in creating more personalized and emotionally resonant music recommendation and therapy systems.
本研究探讨了循环神经网络(RNN)在识别音乐中传达的情感中的应用,旨在通过将音乐定制以适应听众的情感状态来增强音乐推荐系统和支持治疗干预。我们利用拉塞尔的情感四象限将音乐分为四个不同的情感区域,并开发了能够准确预测这些类别的模型。我们的方法包括使用Librosa提取全面音频特征,并应用各种循环神经网络架构,包括标准的RNN、双向RNN和长短时记忆(LSTM)网络。我们对900个音频片段的音频数据集进行了初始实验,并根据情感四象限进行分类。我们比较了我们的神经网络模型的性能与一系列基线分类器的性能,并分析了它们在捕捉音乐表达中的时间动态方面的有效性。结果显示,简单的RNN架构可能与更复杂的模型表现相当,甚至可能优于它们。我们还将在较大数据集上进行以下实验:一是基于我们原始数据集的增强,二是来自其他来源的。这项研究不仅增强了我们对音乐情感影响的了解,还展示了神经网络在创建更个性化和情感共鸣的音乐推荐和治疗系统方面的潜力。
https://arxiv.org/abs/2405.06747
Longstanding data labeling practices in machine learning involve collecting and aggregating labels from multiple annotators. But what should we do when annotators disagree? Though annotator disagreement has long been seen as a problem to minimize, new perspectivist approaches challenge this assumption by treating disagreement as a valuable source of information. In this position paper, we examine practices and assumptions surrounding the causes of disagreement--some challenged by perspectivist approaches, and some that remain to be addressed--as well as practical and normative challenges for work operating under these assumptions. We conclude with recommendations for the data labeling pipeline and avenues for future research engaging with subjectivity and disagreement.
长期的数据标注实践涉及从多个标注者那里收集和汇总标签。但是,当标注者之间存在分歧时,我们该怎么办呢?尽管一直以来我们都认为减少标注者之间的分歧是一个问题,但是新的主观主义方法挑战了这个假设,将分歧视为一个有价值的来源。在这篇论文中,我们检查了分歧产生的原因——有些被 Perspectivist 方法挑战,有些尚未得到解决——以及在这些假设下进行实际和规范工作的挑战。我们最后提出了关于数据标注流程的建议,以及与主观性和分歧相关的研究方向。
https://arxiv.org/abs/2405.05860
Given the increasing demand for mental health assistance, artificial intelligence (AI), particularly large language models (LLMs), may be valuable for integration into automated clinical support systems. In this work, we leverage a decision transformer architecture for topic recommendation in counseling conversations between patients and mental health professionals. The architecture is utilized for offline reinforcement learning, and we extract states (dialogue turn embeddings), actions (conversation topics), and rewards (scores measuring the alignment between patient and therapist) from previous turns within a conversation to train a decision transformer model. We demonstrate an improvement over baseline reinforcement learning methods, and propose a novel system of utilizing our model's output as synthetic labels for fine-tuning a large language model for the same task. Although our implementation based on LLaMA-2 7B has mixed results, future work can undoubtedly build on the design.
鉴于对心理健康协助的需求不断增加,人工智能(AI)特别是大型语言模型(LLMs)可能对于将自动化临床支持系统集成到一起非常有价值。在这项工作中,我们利用决策转换器架构来进行患者和心理健康专业人员之间心理咨询对话的主题推荐。该架构用于离线强化学习,并从对话的前几轮中提取状态(对话轮嵌入)、动作(对话主题)和奖励(衡量患者和治疗师之间同步的分数)来训练决策转换器模型。我们证明了与基线强化学习方法相比的改善,并提出了将模型输出作为大型语言模型的同任务合成标签,用于微调大型语言模型的全新系统。尽管我们的基于LLLA-2 7B的实现结果喜忧参半,但未来的工作无疑可以在此设计上继续发展。
https://arxiv.org/abs/2405.05060
Collaborative filtering (CF) methods for recommendation systems have been extensively researched, ranging from matrix factorization and autoencoder-based to graph filtering-based methods. Recently, lightweight methods that require almost no training have been recently proposed to reduce overall computation. However, existing methods still have room to improve the trade-offs among accuracy, efficiency, and robustness. In particular, there are no well-designed closed-form studies for \emph{balanced} CF in terms of the aforementioned trade-offs. In this paper, we design SVD-AE, a simple yet effective singular vector decomposition (SVD)-based linear autoencoder, whose closed-form solution can be defined based on SVD for CF. SVD-AE does not require iterative training processes as its closed-form solution can be calculated at once. Furthermore, given the noisy nature of the rating matrix, we explore the robustness against such noisy interactions of existing CF methods and our SVD-AE. As a result, we demonstrate that our simple design choice based on truncated SVD can be used to strengthen the noise robustness of the recommendation while improving efficiency. Code is available at this https URL.
协作过滤(CF)方法在推荐系统领域得到了广泛研究,从矩阵分解和自编码器为基础到图过滤为基础的方法。最近,人们提出了一些轻量级的方法,几乎不需要训练,以降低总计算量。然而,现有的方法在准确性和效率之间仍存在潜在的权衡。特别是,在上述权衡方面,没有得到良好设计的闭合形式研究。在本文中,我们设计了一个简单的 yet 有效的 singular vector decomposition (SVD)-based linear autoencoder,称为 SVD-AE,其闭合形式解决方案可以根据 SVD 定义。SVD-AE 不需要迭代训练过程,因为其闭合形式解决方案可以一次性计算出来。此外,考虑到评分矩阵的噪声性质,我们研究了现有 CF 方法的鲁棒性,以及我们 SVD-AE 对这种噪声交互的鲁棒性。结果,我们证明了基于截断 SVD 的简单设计选择可以用于增强推荐系统的噪声鲁棒性,同时提高效率。代码可在此处访问:https://www.acm.org/dl/doi/10.1145/2848006.2848015
https://arxiv.org/abs/2405.04746
Augmenting Large Language Models (LLMs) with image-understanding capabilities has resulted in a boom of high-performing Vision-Language models (VLMs). While studying the alignment of LLMs to human values has received widespread attention, the safety of VLMs has not received the same attention. In this paper, we explore the impact of jailbreaking on three state-of-the-art VLMs, each using a distinct modeling approach. By comparing each VLM to their respective LLM backbone, we find that each VLM is more susceptible to jailbreaking. We consider this as an undesirable outcome from visual instruction-tuning, which imposes a forgetting effect on an LLM's safety guardrails. Therefore, we provide recommendations for future work based on evaluation strategies that aim to highlight the weaknesses of a VLM, as well as take safety measures into account during visual instruction tuning.
通过增强大型语言模型(LLMs)的图像理解能力,已经出现了性能卓越的视觉语言模型(VLMs)的繁荣。尽管研究LLMs与人类价值观的同步已经得到了广泛关注,但VLMs的安全性并没有得到同样的关注。在本文中,我们探讨了对三个最先进的VLMs进行破解的影响,每个VLM使用独特的建模方法进行建模。通过比较每个VLM与它们的相应LLM骨干模型,我们发现每个VLM都更容易被破解。我们将这种情况视为来自视觉指令调整的负面影响,该调整会忘记LLM的安全边界。因此,我们根据旨在突出VLM弱点的评估策略,提出了未来的建议。在视觉指令调整期间,还应该采取安全措施。
https://arxiv.org/abs/2405.04403
Contemporary recommender systems predominantly rely on collaborative filtering techniques, employing ID-embedding to capture latent associations among users and items. However, this approach overlooks the wealth of semantic information embedded within textual descriptions of items, leading to suboptimal performance in cold-start scenarios and long-tail user recommendations. Leveraging the capabilities of Large Language Models (LLMs) pretrained on massive text corpus presents a promising avenue for enhancing recommender systems by integrating open-world domain knowledge. In this paper, we propose an Llm-driven knowlEdge Adaptive RecommeNdation (LEARN) framework that synergizes open-world knowledge with collaborative knowledge. We address computational complexity concerns by utilizing pretrained LLMs as item encoders and freezing LLM parameters to avoid catastrophic forgetting and preserve open-world knowledge. To bridge the gap between the open-world and collaborative domains, we design a twin-tower structure supervised by the recommendation task and tailored for practical industrial application. Through offline experiments on the large-scale industrial dataset and online experiments on A/B tests, we demonstrate the efficacy of our approach.
当代推荐系统主要依赖合作过滤技术,使用ID嵌入来捕捉用户和物品之间的潜在关联。然而,这种方法忽视了物品文本描述中所蕴含的丰富语义信息,导致在冷启动场景和长尾用户推荐中性能较差。利用大型语言模型(LLMs)预先训练在大型文本语料库的能力,为增强推荐系统提供了一个有前途的途径,将开放世界领域知识与协作知识相结合。在本文中,我们提出了一个Llm驱动的知道适应推荐(LEARN)框架,将开放世界知识和协作知识相结合。我们通过使用预训练的LLMs作为项目编码器并冻结LLM参数来解决计算复杂性问题。为了弥合开放世界和协作领域之间的差距,我们设计了一个由推荐任务监督的双层结构,并针对实际工业应用进行了优化。通过在大型工业数据集的离线实验和A/B测试在线实验,我们证明了我们方法的有效性。
https://arxiv.org/abs/2405.03988
Drug discovery is a complex process that involves sequentially screening and examining a vast array of molecules to identify those with the target properties. This process, also referred to as sequential experimentation, faces challenges due to the vast search space, the rarity of target molecules, and constraints imposed by limited data and experimental budgets. To address these challenges, we introduce a human-in-the-loop framework for sequential experiments in drug discovery. This collaborative approach combines human expert knowledge with deep learning algorithms, enhancing the discovery of target molecules within a specified experimental budget. The proposed algorithm processes experimental data to recommend both promising molecules and those that could improve its performance to human experts. Human experts retain the final decision-making authority based on these recommendations and their domain expertise, including the ability to override algorithmic recommendations. We applied our method to drug discovery tasks using real-world data and found that it consistently outperforms all baseline methods, including those which rely solely on human or algorithmic input. This demonstrates the complementarity between human experts and the algorithm. Our results provide key insights into the levels of humans' domain knowledge, the importance of meta-knowledge, and effective work delegation strategies. Our findings suggest that such a framework can significantly accelerate the development of new vaccines and drugs by leveraging the best of both human and artificial intelligence.
药物发现是一个复杂的过程,它涉及对大量分子进行逐一筛选和评估,以确定具有目标特性的分子。这个过程有时也称为序列实验,因为它需要对巨大的搜索空间、目标分子的稀有性和有限的数据和实验预算施加约束。为了应对这些挑战,我们引入了一个将人类专家知识与深度学习算法相结合的药物发现序列实验的人机合作框架。这种合作方法将人类专家的专业知识与深度学习算法相结合,从而在指定的实验预算内显著提高目标分子的发现。所提出的算法对实验数据进行处理,根据这些建议推荐有前景的分子和那些可能改进其表现至人类专家水平的分子。人类专家保留最终决策权,基于这些建议及其专业领域的知识,包括能够推翻算法建议的能力。我们将我们的方法应用于药物发现任务,使用真实世界数据,发现它 consistently超越了所有基线方法,包括那些仅依赖于人类或算法输入的方法。这表明了人类专家和专业算法之间的互补性。我们的结果提供了关于人类领域知识的水平、元知识的价值和有效的工作分配策略的关键见解。我们的发现表明,利用人类和人工智能的最佳优势,这种框架可以显著加速新疫苗和药物的开发。
https://arxiv.org/abs/2405.03942