Personalized outfit recommendation remains a complex challenge, demanding both fashion compatibility understanding and trend awareness. This paper presents a novel framework that harnesses the expressive power of large language models (LLMs) for this task, mitigating their "black box" and static nature through fine-tuning and direct feedback integration. We bridge the item visual-textual gap in items descriptions by employing image captioning with a Multimodal Large Language Model (MLLM). This enables the LLM to extract style and color characteristics from human-curated fashion images, forming the basis for personalized recommendations. The LLM is efficiently fine-tuned on the open-source Polyvore dataset of curated fashion images, optimizing its ability to recommend stylish outfits. A direct preference mechanism using negative examples is employed to enhance the LLM's decision-making process. This creates a self-enhancing AI feedback loop that continuously refines recommendations in line with seasonal fashion trends. Our framework is evaluated on the Polyvore dataset, demonstrating its effectiveness in two key tasks: fill-in-the-blank, and complementary item retrieval. These evaluations underline the framework's ability to generate stylish, trend-aligned outfit suggestions, continuously improving through direct feedback. The evaluation results demonstrated that our proposed framework significantly outperforms the base LLM, creating more cohesive outfits. The improved performance in these tasks underscores the proposed framework's potential to enhance the shopping experience with accurate suggestions, proving its effectiveness over the vanilla LLM based outfit generation.
个性化服装推荐仍然是一个复杂的挑战,需要同时具备时尚兼容性和趋势意识。本文提出了一种新颖的方法,利用大型语言模型(LLMs)的表达能力来解决此任务,通过微调和直接反馈整合来减轻它们的“黑盒子”和静态性质。我们通过采用多模态大型语言模型(MLLM)进行图像 captioning来填补物品描述中的视觉-文本差距。这使得LLM能够从人类策划的时尚图像中提取风格和颜色特征,形成个性化推荐的基础。LLM在经过优化的开源Polyvore数据集中进行高效微调,提高其推荐时尚衣物的能力。采用负例直接偏好机制来增强LLM的决策过程。这导致一个自增强的AI反馈循环,持续根据季节时尚趋势优化建议。在Polyvore数据集上评估我们的框架,证明了其在两个关键任务上的有效性:填空题和互补物品检索。这些评估强调了我们框架通过直接反馈持续改进时尚、与趋势保持一致的服装建议的能力。评估结果显示,与基线LLM相比,我们提出的框架显著提高了性能,创建了更凝聚力的服装组合。这些任务中 improved performance 证明了所提出的框架通过准确建议提高购物体验的重要性,证明其在基于普通LLM的服装生成方面的有效性。
https://arxiv.org/abs/2409.12150
In the rapidly evolving field of machine learning, training models with datasets from various locations and organizations presents significant challenges due to privacy and legal concerns. The exploration of effective collaborative training settings capable of leveraging valuable knowledge from distributed and isolated datasets is increasingly crucial. This study investigates key factors that impact the effectiveness of collaborative training methods in code next-token prediction, as well as the correctness and utility of the generated code, demonstrating the promise of such methods. Additionally, we evaluate the memorization of different participant training data across various collaborative training settings, including centralized, federated, and incremental training, highlighting their potential risks in leaking data. Our findings indicate that the size and diversity of code datasets are pivotal factors influencing the success of collaboratively trained code models. We show that federated learning achieves competitive performance compared to centralized training while offering better data protection, as evidenced by lower memorization ratios in the generated code. However, federated learning can still produce verbatim code snippets from hidden training data, potentially violating privacy or copyright. Our study further explores effectiveness and memorization patterns in incremental learning, emphasizing the sequence in which individual participant datasets are introduced. We also identify cross-organizational clones as a prevalent challenge in both centralized and federated learning scenarios. Our findings highlight the persistent risk of data leakage during inference, even when training data remains unseen. We conclude with recommendations for practitioners and researchers to optimize multisource datasets, propelling cross-organizational collaboration forward.
在快速发展的机器学习领域,使用各种地点和组织的数据集训练模型存在显着的安全和隐私问题。探索能够利用分布式和孤立数据集的有效合作训练设置越来越重要。本研究调查了影响协作训练方法在代码预测下一个词的有效性的关键因素,以及生成的代码的正确性和可用性,证明了这些方法的优势。此外,我们评估了各种协作训练设置中不同参与者训练数据的学习记忆,包括集中式、分布式和增量式训练,突出泄露数据的风险。我们的研究结果表明,代码数据集的大小和多样性是影响协作训练模型成功的关键因素。我们证明了分布式学习在保持竞争力的性能同时提供更好的数据保护方面比集中式训练更有效,正如生成的代码中较低的存储比所表明的。然而,分布式学习仍然可能从隐藏的训练数据中产生等效的代码片段,这可能导致隐私或版权问题。我们的研究进一步研究了增量学习中的效果和记忆模式,强调了在引入个人参与者数据时序列的重要性。我们也指出了集中式和分布式学习场景中跨组织克隆的普遍挑战。我们的研究结果表明,在推理过程中数据泄露的风险持续存在,即使训练数据未被看到。我们得出结论,对于实践者和研究人员,优化多源数据集将推动跨组织合作向前发展。
https://arxiv.org/abs/2409.12020
Hybrid recommender systems, combining item IDs and textual descriptions, offer potential for improved accuracy. However, previous work has largely focused on smaller datasets and model architectures. This paper introduces Flare (Fusing Language models and collaborative Architectures for Recommender Enhancement), a novel hybrid recommender that integrates a language model (mT5) with a collaborative filtering model (Bert4Rec) using a Perceiver network. This architecture allows Flare to effectively combine collaborative and content information for enhanced recommendations. We conduct a two-stage evaluation, first assessing Flare's performance against established baselines on smaller datasets, where it demonstrates competitive accuracy. Subsequently, we evaluate Flare on a larger, more realistic dataset with a significantly larger item vocabulary, introducing new baselines for this setting. Finally, we showcase Flare's inherent ability to support critiquing, enabling users to provide feedback and refine recommendations. We further leverage critiquing as an evaluation method to assess the model's language understanding and its transferability to the recommendation task.
混合推荐系统结合物品ID和文本描述,具有提高准确性的潜力。然而,之前的工作主要集中在较小数据集和模型架构上。本文介绍了一种新颖的混合推荐系统——Flare(Fusing Language models and collaborative Architectures for Recommender Enhancement),它将语言模型(mT5)与协同过滤模型(Bert4Rec)集成到一个Perceiver网络中。这种架构使Flare能够有效结合协同和内容信息进行增强推荐。我们进行了一轮两阶段的评估,第一阶段评估Flare在较小数据集上的表现,它证明了Flare具有竞争力的准确度。随后,我们在一个更大的、更真实的数据集上对Flare进行了评估,数据集具有显著更大的项目词汇表,为该场景引入了新的基准。最后,我们展示了Flare固有的支持批评的能力,使用户能够提供反馈并进行推荐优化。我们还利用批评作为一种评估方法来评估模型的语言理解和其对推荐任务的转移能力。
https://arxiv.org/abs/2409.11699
Segmentation of the fetal and maternal structures, particularly intrapartum ultrasound imaging as advocated by the International Society of Ultrasound in Obstetrics and Gynecology (ISUOG) for monitoring labor progression, is a crucial first step for quantitative diagnosis and clinical decision-making. This requires specialized analysis by obstetrics professionals, in a task that i) is highly time- and cost-consuming and ii) often yields inconsistent results. The utility of automatic segmentation algorithms for biometry has been proven, though existing results remain suboptimal. To push forward advancements in this area, the Grand Challenge on Pubic Symphysis-Fetal Head Segmentation (PSFHS) was held alongside the 26th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2023). This challenge aimed to enhance the development of automatic segmentation algorithms at an international scale, providing the largest dataset to date with 5,101 intrapartum ultrasound images collected from two ultrasound machines across three hospitals from two institutions. The scientific community's enthusiastic participation led to the selection of the top 8 out of 179 entries from 193 registrants in the initial phase to proceed to the competition's second stage. These algorithms have elevated the state-of-the-art in automatic PSFHS from intrapartum ultrasound images. A thorough analysis of the results pinpointed ongoing challenges in the field and outlined recommendations for future work. The top solutions and the complete dataset remain publicly available, fostering further advancements in automatic segmentation and biometry for intrapartum ultrasound imaging.
胎儿和母体结构的分割,特别是阴道超声成像,作为国际超声学会在产科和妇科超声医学领域的建议,对于监测分娩进程具有关键作用。这需要产科专业人员的专门分析,而在这种任务上,i)耗时且成本高,ii)通常得出不一致的结果。自动分割算法在生物测量学方面的应用已经得到证实,尽管现有结果仍然存在不足。为了推动该领域的进步,在2023年的国际医学图像计算和计算机辅助干预(MICCAI)会议上,举办了“公众骶骨-胎儿头部分割大挑战”。该挑战旨在通过在国际范围内开发自动分割算法,为开发迄今最大的数据集提供支持,该数据集包括来自两家机构的三个医院的5,101个阴道超声图像。科学界的热情参与导致来自193名注册者的初步阶段前八名进入竞赛的第二阶段。这些算法使自动PSFHS的现状达到了前所未有的水平。对结果的深入分析突出了该领域当前的挑战,并指出了未来工作的建议。排名前两位的解决方案和完整的数据集仍然公开可用,为进一步推动自动分割和生物测量学在阴道超声成像方面的进步提供了支持。
https://arxiv.org/abs/2409.10980
Large Language Model (LLM)-based recommendation systems provide more comprehensive recommendations than traditional systems by deeply analyzing content and user behavior. However, these systems often exhibit biases, favoring mainstream content while marginalizing non-traditional options due to skewed training data. This study investigates the intricate relationship between bias and LLM-based recommendation systems, with a focus on music, song, and book recommendations across diverse demographic and cultural groups. Through a comprehensive analysis conducted over different LLM-models, this paper evaluates the impact of bias on recommendation outcomes. Our findings reveal that bias is so deeply ingrained within these systems that even a simpler intervention like prompt engineering can significantly reduce bias, underscoring the pervasive nature of the issue. Moreover, factors like intersecting identities and contextual information, such as socioeconomic status, further amplify these biases, demonstrating the complexity and depth of the challenges faced in creating fair recommendations across different groups.
基于大型语言模型(LLM)的推荐系统比传统系统提供更全面的推荐,因为它们深入分析内容和使用行为。然而,这些系统通常存在偏见,因为训练数据存在偏差,倾向于主流内容,而忽略非传统选项。这项研究调查了偏见和基于LLM的推荐系统之间的关系,重点关注不同 demographic 和 cultural 群体中的音乐、歌曲和书籍推荐。通过对不同 LLM 模型的全面分析,本文评估了偏见对推荐结果的影响。我们的研究结果表明,偏见在这些系统中如此之深,以至于即使是像提示工程这样简单的干预措施也可以显著减少偏见,凸显了问题的普遍性。此外,像交集身份和社会信息等因素进一步放大了偏见,表明了在为不同群体创建公平的推荐方面所面临的复杂性和深度。
https://arxiv.org/abs/2409.10825
Emojis have become an integral part of digital communication, enriching text by conveying emotions, tone, and intent. Existing emoji recommendation methods are primarily evaluated based on their ability to match the exact emoji a user chooses in the original text. However, they ignore the essence of users' behavior on social media in that each text can correspond to multiple reasonable emojis. To better assess a model's ability to align with such real-world emoji usage, we propose a new semantics preserving evaluation framework for emoji recommendation, which measures a model's ability to recommend emojis that maintain the semantic consistency with the user's text. To evaluate how well a model preserves semantics, we assess whether the predicted affective state, demographic profile, and attitudinal stance of the user remain unchanged. If these attributes are preserved, we consider the recommended emojis to have maintained the original semantics. The advanced abilities of Large Language Models (LLMs) in understanding and generating nuanced, contextually relevant output make them well-suited for handling the complexities of semantics preserving emoji recommendation. To this end, we construct a comprehensive benchmark to systematically assess the performance of six proprietary and open-source LLMs using different prompting techniques on our task. Our experiments demonstrate that GPT-4o outperforms other LLMs, achieving a semantics preservation score of 79.23%. Additionally, we conduct case studies to analyze model biases in downstream classification tasks and evaluate the diversity of the recommended emojis.
表情符号已经成为数字通信的重要组成部分,通过传达情感、语调和意图,丰富了文本。现有的表情符号推荐方法主要根据其匹配用户在原文中选择的准确表情的能力进行评估。然而,它们忽略了用户在社交媒体上的行为本质,因为每篇文本都可以对应多种合理的表情符号。为了更准确地评估一个模型在现实世界表情符号使用中的对齐能力,我们提出了一个新的保持语义一致性的评估框架,用于表情符号推荐,该框架衡量了模型推荐具有语义一致性的表情符号的能力。为了评估模型保留语义的能力,我们评估预测的用户情感状态、人口统计学和态度立场是否发生改变。如果这些属性保持不变,我们考虑推荐的表情符号保留了原始语义。大型语言模型(LLMs)在理解和生成复杂、上下文相关的输出方面具有先进的能力,使它们非常适合处理保留表情符号语义的建议。因此,我们为我们的任务系统构建了一个全面的基准,使用不同的提示技术对六个专用和开源LLM进行性能评估。我们的实验结果表明,GPT-4o在其他LLM中表现优异,其语义保留得分达到79.23%。此外,我们进行案例研究,分析下游分类任务中的模型偏见,并评估推荐表情符号的多样性。
https://arxiv.org/abs/2409.10760
This paper presents a diffusion-based recommender system that incorporates classifier-free guidance. Most current recommender systems provide recommendations using conventional methods such as collaborative or content-based filtering. Diffusion is a new approach to generative AI that improves on previous generative AI approaches such as Variational Autoencoders (VAEs) and Generative Adversarial Networks (GANs). We incorporate diffusion in a recommender system that mirrors the sequence users take when browsing and rating items. Although a few current recommender systems incorporate diffusion, they do not incorporate classifier-free guidance, a new innovation in diffusion models as a whole. In this paper, we present a diffusion recommender system that augments the underlying recommender system model for improved performance and also incorporates classifier-free guidance. Our findings show improvements over state-of-the-art recommender systems for most metrics for several recommendation tasks on a variety of datasets. In particular, our approach demonstrates the potential to provide better recommendations when data is sparse.
本文介绍了一种基于扩散的推荐系统,该系统集成了分类器无关指导。大多数现有的推荐系统使用传统方法,如合作或内容过滤来进行推荐。扩散是一种新的生成人工智能方法,它在VAE和生成对抗网络等先前的生成人工智能方法上进行了改进。我们将扩散融入到一个反映用户在浏览和评分项目时所采取的序列的推荐系统中。尽管目前一些推荐系统已经包含了扩散,但它们并没有包含分类器无关指导,这是扩散模型整体的全新创新。在本文中,我们介绍了一种用于增强底层推荐系统模型的扩散推荐系统,以提高性能,并还集成了分类器无关指导。我们的研究结果表明,对于大多数数据集,基于扩散的推荐系统在多个推荐任务上均优于最先进的推荐系统。特别是,我们的方法在数据稀疏时提供了更好的推荐。
https://arxiv.org/abs/2409.10494
Implicit feedback, often used to build recommender systems, unavoidably confronts noise due to factors such as misclicks and position bias. Previous studies have attempted to alleviate this by identifying noisy samples based on their diverged patterns, such as higher loss values, and mitigating the noise through sample dropping or reweighting. Despite the progress, we observe existing approaches struggle to distinguish hard samples and noise samples, as they often exhibit similar patterns, thereby limiting their effectiveness in denoising recommendations. To address this challenge, we propose a Large Language Model Enhanced Hard Sample Denoising (LLMHD) framework. Specifically, we construct an LLM-based scorer to evaluate the semantic consistency of items with the user preference, which is quantified based on summarized historical user interactions. The resulting scores are used to assess the hardness of samples for the pointwise or pairwise training objectives. To ensure efficiency, we introduce a variance-based sample pruning strategy to filter potential hard samples before scoring. Besides, we propose an iterative preference update module designed to continuously refine summarized user preference, which may be biased due to false-positive user-item interactions. Extensive experiments on three real-world datasets and four backbone recommenders demonstrate the effectiveness of our approach.
隐式反馈,通常用于构建推荐系统,不可避免地受到因素如点击偏差和位置偏见的噪声影响。之前的研究试图通过根据其分叉模式识别噪声样本来减轻这种噪声,并通过样本丢弃或重新加权来缓解噪声。尽管如此,我们观察到现有的方法很难区分硬样本和噪声样本,因为它们通常表现出类似的模式,从而限制了它们在去噪推荐中的有效性。为解决这个问题,我们提出了一个大型语言模型增强硬样本去噪(LLMHD)框架。具体来说,我们构建了一个基于LLM的评分器来评估用户偏好的物品的语义一致性,该一致性基于总结历史用户交互。得分用于评估点wise或成对训练目标中的样本的难度。为了确保效率,我们引入了一种基于方差的可变样本剪枝策略,在评分之前过滤可能为硬样本的潜在样本。此外,我们提出了一种迭代偏好更新模块,旨在持续优化总结的用户偏好,该偏好可能受到虚假正例用户-物品交互的偏见影响。在三个真实世界数据集和四个基线推荐器上的大量实验证明了我们方法的有效性。
https://arxiv.org/abs/2409.10343
Causality is receiving increasing attention by the artificial intelligence and machine learning communities. This paper gives an example of modelling a recommender system problem using causal graphs. Specifically, we approached the causal discovery task to learn a causal graph by combining observational data from an open-source dataset with prior knowledge. The resulting causal graph shows that only a few variables effectively influence the analysed feedback signals. This contrasts with the recent trend in the machine learning community to include more and more variables in massive models, such as neural networks.
因果关系正越来越受到人工智能和机器学习社区的关注。本文通过使用因果图建模推荐系统问题,为例展示了如何利用开源数据集的观测数据和先验知识来求解因果关系。通过这种方式,我们方法了有效影响分析反馈信号的几个变量。这与机器学习社区近年来趋向于将越来越多的变量包含在大型模型中的趋势形成了鲜明对比,如神经网络模型。
https://arxiv.org/abs/2409.10271
This paper intends to address the challenge of personalized recipe recommendation in the realm of diverse culinary preferences. The problem domain involves recipe recommendations, utilizing techniques such as association analysis and classification. Association analysis explores the relationships and connections between different ingredients to enhance the user experience. Meanwhile, the classification aspect involves categorizing recipes based on user-defined ingredients and preferences. A unique aspect of the paper is the consideration of recipes and ingredients belonging to multiple classes, recognizing the complexity of culinary combinations. This necessitates a sophisticated approach to classification and recommendation, ensuring the system accommodates the nature of recipe categorization. The paper seeks not only to recommend recipes but also to explore the process involved in achieving accurate and personalized recommendations.
本文旨在解决在多样美食偏好领域中进行个性化食谱推荐所面临的挑战。问题领域涉及食谱推荐,利用技术如关联分析和分类。关联分析探讨了不同食材之间的联系和关系,以提高用户体验。同时,分类方面涉及根据用户定义的食材和偏好对食谱进行分类。本文的一个独特之处是考虑了属于多个类的食谱和食材,认识到美食组合的复杂性。这需要对分类和推荐采取深入的方法,确保系统适应食谱分类的性质。本文旨在推荐食谱,同时也旨在探索实现准确和个性化的推荐所需的过程。
https://arxiv.org/abs/2409.10267
Cerebrovascular disease often requires multiple imaging modalities for accurate diagnosis, treatment, and monitoring. Computed Tomography Angiography (CTA) and Time-of-Flight Magnetic Resonance Angiography (TOF-MRA) are two common non-invasive angiography techniques, each with distinct strengths in accessibility, safety, and diagnostic accuracy. While CTA is more widely used in acute stroke due to its faster acquisition times and higher diagnostic accuracy, TOF-MRA is preferred for its safety, as it avoids radiation exposure and contrast agent-related health risks. Despite the predominant role of CTA in clinical workflows, there is a scarcity of open-source CTA data, limiting the research and development of AI models for tasks such as large vessel occlusion detection and aneurysm segmentation. This study explores diffusion-based image-to-image translation models to generate synthetic CTA images from TOF-MRA input. We demonstrate the modality conversion from TOF-MRA to CTA and show that diffusion models outperform a traditional U-Net-based approach. Our work compares different state-of-the-art diffusion architectures and samplers, offering recommendations for optimal model performance in this cross-modality translation task.
脑血管疾病通常需要多种影像学手段进行准确诊断、治疗和监测。计算机断层扫描(CTA)和飞行时间磁共振血管成像(TOF-MRA)是两种常见的非侵入性血管成像技术,它们在可访问性、安全性和诊断准确性方面具有各自的优势。虽然CTA因为其更快的采集时间和更高的诊断准确性在急性中风中应用更广泛,但TOF-MRA因其安全性而受到青睐,因为它避免了放射性暴露和对比剂相关的健康风险。 尽管CTA在临床工作流程中占据主导地位,但开源CTA数据很少,这限制了AI模型在诸如大血管闭塞检测和动脉瘤分割等任务上的研究和开发。本研究探讨了扩散为基础的图像到图像转换模型,从TOF-MRA输入生成合成CTA图像。我们证明了从TOF-MRA到CTA的模态转换,并表明扩散模型优于传统的U-Net基于的方法。我们的工作比较了不同最先进的扩散架构和采样策略,为跨模态转换任务提供了建议,以实现最佳模型性能。
https://arxiv.org/abs/2409.10089
The advent of foundation models (FMs) such as large language models (LLMs) has led to a cultural shift in data science, both in medicine and beyond. This shift involves moving away from specialized predictive models trained for specific, well-defined domain questions to generalist FMs pre-trained on vast amounts of unstructured data, which can then be adapted to various clinical tasks and questions. As a result, the standard data science workflow in medicine has been fundamentally altered; the foundation model lifecycle (FMLC) now includes distinct upstream and downstream processes, in which computational resources, model and data access, and decision-making power are distributed among multiple stakeholders. At their core, FMs are fundamentally statistical models, and this new workflow challenges the principles of Veridical Data Science (VDS), hindering the rigorous statistical analysis expected in transparent and scientifically reproducible data science practices. We critically examine the medical FMLC in light of the core principles of VDS: predictability, computability, and stability (PCS), and explain how it deviates from the standard data science workflow. Finally, we propose recommendations for a reimagined medical FMLC that expands and refines the PCS principles for VDS including considering the computational and accessibility constraints inherent to FMs.
基础模型(FMs)的的出现导致了数据科学中文化变革,不仅在医学领域,而且在其他领域。这一变革涉及从针对特定、明确定义领域问题的专业预测模型向基于大量无结构数据的大众化FM的转变。然后可以将其适应各种临床任务和问题。因此,医学中的标准数据科学工作流程已经从根本上改变了;基础模型生命周期(FMLC)现在包括多个利益相关者之间的明显上游和下游过程,其中计算资源、模型和数据访问权限以及决策权被分布在多个利益相关者之间。 本质上,FM是统计模型,这一新的工作流程挑战了Veridical Data Science(VDS)的原则,阻碍了透明和科学地重复进行的数据科学实践。我们批判性地审查了医学中的FMLC,鉴于VDS的核心原则:预测性、可计算性和稳定性(PCS),解释了其与标准数据科学工作流程之间的分歧。最后,我们提出了一个重新想象过的医学FMLC的建议,其中包括考虑FM的计算和可访问性限制,以扩展和完善PCS原则,适应FM。
https://arxiv.org/abs/2409.10580
Federated learning holds great potential for enabling large-scale healthcare research and collaboration across multiple centres while ensuring data privacy and security are not compromised. Although numerous recent studies suggest or utilize federated learning based methods in healthcare, it remains unclear which ones have potential clinical utility. This review paper considers and analyzes the most recent studies up to May 2024 that describe federated learning based methods in healthcare. After a thorough review, we find that the vast majority are not appropriate for clinical use due to their methodological flaws and/or underlying biases which include but are not limited to privacy concerns, generalization issues, and communication costs. As a result, the effectiveness of federated learning in healthcare is significantly compromised. To overcome these challenges, we provide recommendations and promising opportunities that might be implemented to resolve these problems and improve the quality of model development in federated learning with healthcare.
联邦学习在促进大规模医疗研究和跨多个中心进行合作方面具有巨大的潜力,同时确保数据隐私和安全不受损害。尽管许多最近的研究在医疗领域基于联邦学习的方法,但仍然不清楚哪些方法具有临床应用潜力。本文回顾了截至2024年5月的研究,描述了基于联邦学习的方法在医疗保健领域。经过深入的审查,我们发现绝大多数方法由于其方法论缺陷和/或潜在的偏见(包括隐私问题、泛化问题和沟通成本等)而不适用于临床使用。因此,联邦学习在医疗保健领域的有效性受到严重影响。为了克服这些挑战,我们提供了建议和有前景的机会,这些机会可能被实施以解决这些问题并改善与医疗保健相关的模型开发质量。
https://arxiv.org/abs/2409.09727
As minimally verbal autistic (MVA) children communicate with parents through few words and nonverbal cues, parents often struggle to encourage their children to express subtle emotions and needs and to grasp their nuanced signals. We present AACessTalk, a tablet-based, AI-mediated communication system that facilitates meaningful exchanges between an MVA child and a parent. AACessTalk provides real-time guides to the parent to engage the child in conversation and, in turn, recommends contextual vocabulary cards to the child. Through a two-week deployment study with 11 MVA child-parent dyads, we examine how AACessTalk fosters everyday conversation practice and mutual engagement. Our findings show high engagement from all dyads, leading to increased frequency of conversation and turn-taking. AACessTalk also encouraged parents to explore their own interaction strategies and empowered the children to have more agency in communication. We discuss the implications of designing technologies for balanced communication dynamics in parent-MVA child interaction.
作为最小程度的言语自闭症(MVA)儿童通过少量的单词和非言语线索与父母交流时,父母经常很难鼓励他们的孩子表达微妙的情感和需求,并理解他们的复杂信号。我们介绍了一款基于平板电脑、通过人工智能进行沟通的AACessTalk系统,该系统促进MVA儿童与父母之间的有意义交流。AACessTalk为父母提供了实时指南,以与孩子进行对话,并相应地向孩子推荐上下文词汇卡。在为期两周的部署研究中,我们研究了11个MVA儿童与父母之间的关系,探讨了AACessTalk如何促进日常对话练习和相互参与。我们的研究结果表明,所有家庭的参与度都很高,导致对话频率和轮换次数增加。AACessTalk还鼓励父母探索自己的交流策略,并使孩子更有自主地在沟通中。我们讨论了为平衡沟通动态而在家长-MVA儿童互动中设计技术的潜在影响。
https://arxiv.org/abs/2409.09641
Owing to the unprecedented capability in semantic understanding and logical reasoning, the pre-trained large language models (LLMs) have shown fantastic potential in developing the next-generation recommender systems (RSs). However, the static index paradigm adopted by current methods greatly restricts the utilization of LLMs capacity for recommendation, leading to not only the insufficient alignment between semantic and collaborative knowledge, but also the neglect of high-order user-item interaction patterns. In this paper, we propose Twin-Tower Dynamic Semantic Recommender (TTDS), the first generative RS which adopts dynamic semantic index paradigm, targeting at resolving the above problems simultaneously. To be more specific, we for the first time contrive a dynamic knowledge fusion framework which integrates a twin-tower semantic token generator into the LLM-based recommender, hierarchically allocating meaningful semantic index for items and users, and accordingly predicting the semantic index of target item. Furthermore, a dual-modality variational auto-encoder is proposed to facilitate multi-grained alignment between semantic and collaborative knowledge. Eventually, a series of novel tuning tasks specially customized for capturing high-order user-item interaction patterns are proposed to take advantages of user historical behavior. Extensive experiments across three public datasets demonstrate the superiority of the proposed methodology in developing LLM-based generative RSs. The proposed TTDS recommender achieves an average improvement of 19.41% in Hit-Rate and 20.84% in NDCG metric, compared with the leading baseline methods.
由于在语义理解和逻辑推理方面的前所未有的能力,预训练的大型语言模型(LLMs)在发展下一代推荐系统(RS)方面表现出巨大的潜力。然而,当前方法采用的静态索引范式极大地限制了LLMs的能力用于推荐,导致语义和协同知识之间的不足以及高阶用户-项目交互模式被忽视。在本文中,我们提出了Twin-Tower动态语义推荐(TTDS),是第一个采用动态语义索引范式的生成式RS,旨在同时解决上述问题。具体来说,我们首次设计了一个动态知识融合框架,将同层语义标记的生成器集成到基于LLM的推荐中,按层次分配有意义的语义索引给项目和用户,并相应地预测目标项目的语义索引。此外,还提出了一个双模态 Variational Auto-Encoder(VAE)以促进语义和协同知识之间的多尺度对齐。最后,针对捕捉高阶用户-项目交互模式的优势,我们提出了一系列专门定制的 tuning 任务。通过对三个公开数据集的广泛实验,证明了与领先基线方法相比,所提出方法在发展基于LLM的生成式RS方面的优越性。提出的TTDS推荐器在Hit-Rate和NDCG指标上的平均改进率分别为19.41%和20.84%,与领先基线方法相当。
https://arxiv.org/abs/2409.09253
Music recommender systems frequently utilize network-based models to capture relationships between music pieces, artists, and users. Although these relationships provide valuable insights for predictions, new music pieces or artists often face the cold-start problem due to insufficient initial information. To address this, one can extract content-based information directly from the music to enhance collaborative-filtering-based methods. While previous approaches have relied on hand-crafted audio features for this purpose, we explore the use of contrastively pretrained neural audio embedding models, which offer a richer and more nuanced representation of music. Our experiments demonstrate that neural embeddings, particularly those generated with the Contrastive Language-Audio Pretraining (CLAP) model, present a promising approach to enhancing music recommendation tasks within graph-based frameworks.
音乐推荐系统通常使用基于网络的模型来捕捉音乐作品、艺术家和用户之间的关系。尽管这些关系对于预测非常有价值,但新音乐作品或艺术家通常由于缺乏初始信息而面临冷启动问题。为了解决这个问题,可以从音乐中提取内容信息来增强基于协同过滤的方法。虽然之前的方法依赖于手工制作的音频特征来实现这一目的,我们探讨了使用预训练的对比性预训练神经音频嵌入模型(如 Contrastive Language-Audio Pretraining 模型)来丰富和细化音乐表示的方法。我们的实验证明,神经嵌入,特别是使用 CLAP 模型生成的神经嵌入,为增强基于图形的音乐推荐任务提供了有前途的方法。
https://arxiv.org/abs/2409.09026
Click-through-rate (CTR) prediction plays an important role in online advertising and ad recommender systems. In the past decade, maximizing CTR has been the main focus of model development and solution creation. Therefore, researchers and practitioners have proposed various models and solutions to enhance the effectiveness of CTR prediction. Most of the existing literature focuses on capturing either implicit or explicit feature interactions. Although implicit interactions are successfully captured in some studies, explicit interactions present a challenge for achieving high CTR by extracting both low-order and high-order feature interactions. Unnecessary and irrelevant features may cause high computational time and low prediction performance. Furthermore, certain features may perform well with specific predictive models while underperforming with others. Also, feature distribution may fluctuate due to traffic variations. Most importantly, in live production environments, resources are limited, and the time for inference is just as crucial as training time. Because of all these reasons, feature selection is one of the most important factors in enhancing CTR prediction model performance. Simple filter-based feature selection algorithms do not perform well and they are not sufficient. An effective and efficient feature selection algorithm is needed to consistently filter the most useful features during live CTR prediction process. In this paper, we propose a heuristic algorithm named Neighborhood Search with Heuristic-based Feature Selection (NeSHFS) to enhance CTR prediction performance while reducing dimensionality and training time costs. We conduct comprehensive experiments on three public datasets to validate the efficiency and effectiveness of our proposed solution.
点击率(CTR)预测在在线广告和广告推荐系统中起着重要作用。在过去的十年里,最大化CTR一直是最主要的模型发展和解决方案创建的重点。因此,研究人员和实践者提出了各种模型和解决方案来提高CTR预测的有效性。大部分现有文献关注于捕捉隐含或显含特征之间的相互作用。尽管在某些研究中隐含相互作用成功地被捕捉到,但通过提取低阶和高阶特征交互来获得高CTR仍然具有挑战性。不必要的和不相关的特征可能会导致高计算时间和低预测性能。此外,某些特征可能会在特定的预测模型上表现出色,而在其他模型上表现不佳。另外,由于流量变化,特征分布可能会波动。在实时生产环境中,资源有限,推理时间同样至关重要。由于以上原因,特征选择是增强CTR预测模型性能的最重要因素之一。简单的滤波器基础特征选择算法表现不佳,而且它们并不足够有效。需要一种有效且高效的特征选择算法来在实时CTR预测过程中持续过滤最有用的特征。在本文中,我们提出了名为Neighborhood Search with Heuristic-based Feature Selection(NeSHFS)的启发式算法,以提高CTR预测性能并降低维度和训练时间成本。我们对三个公共数据集进行了全面的实验,以验证我们提出的解决方案的有效性和有效性。
https://arxiv.org/abs/2409.08703
Recommender Systems (RS) play a pivotal role in boosting user satisfaction by providing personalized product suggestions in domains such as e-commerce and entertainment. This study examines the integration of multimodal data text and audio into large language models (LLMs) with the aim of enhancing recommendation performance. Traditional text and audio recommenders encounter limitations such as the cold-start problem, and recent advancements in LLMs, while promising, are computationally expensive. To address these issues, Low-Rank Adaptation (LoRA) is introduced, which enhances efficiency without compromising performance. The ATFLRec framework is proposed to integrate audio and text modalities into a multimodal recommendation system, utilizing various LoRA configurations and modality fusion techniques. Results indicate that ATFLRec outperforms baseline models, including traditional and graph neural network-based approaches, achieving higher AUC scores. Furthermore, separate fine-tuning of audio and text data with distinct LoRA modules yields optimal performance, with different pooling methods and Mel filter bank numbers significantly impacting performance. This research offers valuable insights into optimizing multimodal recommender systems and advancing the integration of diverse data modalities in LLMs.
推荐系统(RS)在提高用户满意度方面发挥了关键作用,通过在电子商务和娱乐等领域提供个性化的产品建议。本研究旨在探讨将多模态数据文本和音频融入大型语言模型(LLMs)中,以提高推荐性能。传统的文本和音频推荐器遇到了一些限制,例如冷启动问题,而LLM的最近进展在计算上是昂贵的。为了应对这些问题,引入了低秩适应(LoRA)方法,它可以在不牺牲性能的情况下提高效率。ATFLRec框架被提出,将音频和文本模态融入多模态推荐系统,利用各种LoRA配置和模态融合技术。结果表明,ATFLRec超越了基线模型,包括传统和基于图神经网络的方法,实现了更高的AUC得分。此外,使用不同的LoRA模块对音频和文本数据进行单独微调可以获得最优性能,而不同的池化方法和 Mel滤波器带数显著影响性能。这项研究为优化多模态推荐系统提供了宝贵的见解,并促进了多样数据模态在LLM中的集成。
https://arxiv.org/abs/2409.08543
Knowledge tagging for questions is vital in modern intelligent educational applications, including learning progress diagnosis, practice question recommendations, and course content organization. Traditionally, these annotations have been performed by pedagogical experts, as the task demands not only a deep semantic understanding of question stems and knowledge definitions but also a strong ability to link problem-solving logic with relevant knowledge concepts. With the advent of advanced natural language processing (NLP) algorithms, such as pre-trained language models and large language models (LLMs), pioneering studies have explored automating the knowledge tagging process using various machine learning models. In this paper, we investigate the use of a multi-agent system to address the limitations of previous algorithms, particularly in handling complex cases involving intricate knowledge definitions and strict numerical constraints. By demonstrating its superior performance on the publicly available math question knowledge tagging dataset, MathKnowCT, we highlight the significant potential of an LLM-based multi-agent system in overcoming the challenges that previous methods have encountered. Finally, through an in-depth discussion of the implications of automating knowledge tagging, we underscore the promising results of deploying LLM-based algorithms in educational contexts.
知识标签对于问题是在现代智能教育应用中至关重要,包括学习进程诊断、练习问题推荐和课程内容组织。传统上,这些注释是由教育专家完成的,因为这项任务要求不仅对问题 stem 和知识定义有深入语义理解,而且具有将问题解决逻辑与相关知识概念进行关联的能力。随着高级自然语言处理(NLP)算法的出现,如预训练语言模型和大型语言模型(LLMs),先驱研究探索了使用各种机器学习模型自动化知识标签过程。在本文中,我们研究了使用多智能体系统解决以前算法局限性的问题,特别是处理复杂知识定义和严格数值约束的案例。通过在公开可用的数学问题知识标签数据集MathKnowCT上表现出卓越的性能,我们强调了基于LLM的多智能体系统在克服前方法所遇到的挑战方面具有重大潜力。最后,通过深入讨论自动化知识标签的后果,我们强调了将LLM为基础的算法在教育环境中部署的积极结果。
https://arxiv.org/abs/2409.08406
Clustering artworks based on style has many potential real-world applications like art recommendations, style-based search and retrieval, and the study of artistic style evolution in an artwork corpus. However, clustering artworks based on style is largely an unaddressed problem. A few present methods for clustering artworks principally rely on generic image feature representations derived from deep neural networks and do not specifically deal with the artistic style. In this paper, we introduce and deliberate over the notion of style-based clustering of visual artworks. Our main objective is to explore neural feature representations and architectures that can be used for style-based clustering and observe their impact and effectiveness. We develop different methods and assess their relative efficacy for style-based clustering through qualitative and quantitative analysis by applying them to four artwork corpora and four curated synthetically styled datasets. Our analysis provides some key novel insights on architectures, feature representations, and evaluation methods suitable for style-based clustering.
基于风格的聚类艺术作品有很多实际应用,如艺术推荐、基于风格的搜索和检索,以及研究艺术作品库中的艺术风格演变。然而,基于风格的聚类艺术作品主要是未解决的问题。目前提出的几种基于风格的聚类方法主要依赖于深度神经网络生成的通用图像特征表示,并没有具体处理艺术风格。在本文中,我们介绍并探讨了基于风格的视觉艺术品聚类概念。我们的主要目标是对可用于基于风格的聚类神经特征表示和架构进行探索,并观察它们的影响和效果。我们开发了不同的方法和评估它们对基于风格的聚类的效果,通过定性和定量分析将它们应用于四个艺术作品数据集和四个合成艺术风格的数据集。我们的分析提供了一些关于适合基于风格的聚类的新颖见解,包括架构、特征表示和评估方法。
https://arxiv.org/abs/2409.08245