Multi-domain recommendation and multi-task recommendation have demonstrated their effectiveness in leveraging common information from different domains and objectives for comprehensive user modeling. Nonetheless, the practical recommendation usually faces multiple domains and tasks simultaneously, which cannot be well-addressed by current methods. To this end, we introduce M3oE, an adaptive multi-domain multi-task mixture-of-experts recommendation framework. M3oE integrates multi-domain information, maps knowledge across domains and tasks, and optimizes multiple objectives. We leverage three mixture-of-experts modules to learn common, domain-aspect, and task-aspect user preferences respectively to address the complex dependencies among multiple domains and tasks in a disentangled manner. Additionally, we design a two-level fusion mechanism for precise control over feature extraction and fusion across diverse domains and tasks. The framework's adaptability is further enhanced by applying AutoML technique, which allows dynamic structure optimization. To the best of the authors' knowledge, our M3oE is the first effort to solve multi-domain multi-task recommendation self-adaptively. Extensive experiments on two benchmark datasets against diverse baselines demonstrate M3oE's superior performance. The implementation code is available to ensure reproducibility.
多领域推荐和多任务推荐已经在利用不同领域的共同信息和目标进行全面的用户建模方面取得了有效性。然而,通常的实践推荐需要同时处理多个领域和任务,而当前的方法很难解决这一问题。为此,我们引入了M3oE,一个自适应的多领域多任务专家混合推荐框架。M3oE整合了多领域信息,将知识跨越领域和任务,并优化多个目标。我们利用三个专家混合模块来分别学习共同的、领域特性和任务特性的用户偏好,以以分离的方式解决多个领域和任务之间的复杂依赖关系。此外,我们还设计了一个双级融合机制,用于精确控制跨多个领域和任务的特征提取和融合。通过应用自动机器学习技术(动态结构优化),该框架的适应性得到了进一步增强。据我们所知,M3oE是第一个自适应解决多领域多任务推荐的尝试。在两个基准数据集上的广泛实验表明,M3oE在各种基线方法中具有卓越的性能。实现代码可用于确保可重复性。
https://arxiv.org/abs/2404.18465
MTL is a learning paradigm that effectively leverages both task-specific and shared information to address multiple related tasks simultaneously. In contrast to STL, MTL offers a suite of benefits that enhance both the training process and the inference efficiency. MTL's key advantages encompass streamlined model architecture, performance enhancement, and cross-domain generalizability. Over the past twenty years, MTL has become widely recognized as a flexible and effective approach in various fields, including CV, NLP, recommendation systems, disease prognosis and diagnosis, and robotics. This survey provides a comprehensive overview of the evolution of MTL, encompassing the technical aspects of cutting-edge methods from traditional approaches to deep learning and the latest trend of pretrained foundation models. Our survey methodically categorizes MTL techniques into five key areas: regularization, relationship learning, feature propagation, optimization, and pre-training. This categorization not only chronologically outlines the development of MTL but also dives into various specialized strategies within each category. Furthermore, the survey reveals how the MTL evolves from handling a fixed set of tasks to embracing a more flexible approach free from task or modality constraints. It explores the concepts of task-promptable and -agnostic training, along with the capacity for ZSL, which unleashes the untapped potential of this historically coveted learning paradigm. Overall, we hope this survey provides the research community with a comprehensive overview of the advancements in MTL from its inception in 1997 to the present in 2023. We address present challenges and look ahead to future possibilities, shedding light on the opportunities and potential avenues for MTL research in a broad manner. This project is publicly available at this https URL.
MTL是一种学习范式,有效利用任务特定和共享信息来同时解决多个相关任务。与STL相比,MTL提供了一系列增强训练过程和推理效率的优势。MTL的关键优势包括简化模型架构、性能提升和跨领域泛化。在过去的二十年中,MTL已经成为许多领域广泛认可的灵活有效的解决方案,包括CV、自然语言处理、推荐系统、疾病预后和诊断、以及机器人领域。本次调查全面回顾了MTL的发展历程,从传统方法的尖端技术到深度学习的最新趋势,以及预训练基础模型的最新趋势。我们的调查系统地分类MTL技术为五个关键领域:正则化、关系学习、特征传播、优化和预训练。这种分类不仅按时间顺序描述了MTL的发展,还深入研究了每个领域的各种专业策略。此外,调查揭示了MTL如何从处理固定任务转变为更加灵活的方法,摆脱了任务或模型约束。它探讨了任务提示的和无条件的训练概念,以及ZSL(零样本学习)的能力,揭示了这一历史悠久的值得称赞的学习范式所蕴含的潜力。总的来说,我们希望通过这次调查为研究社区提供MTL从1997年创立到2023年的全面概述。我们关注当前的挑战,展望未来的机遇,以一种全面的方式揭示MTL研究在各个领域的机会和潜在途径。这个项目在https://这个URL上公开可用。
https://arxiv.org/abs/2404.18961
Building future wireless systems that support services like digital twins (DTs) is challenging to achieve through advances to conventional technologies like meta-surfaces. While artificial intelligence (AI)-native networks promise to overcome some limitations of wireless technologies, developments still rely on AI tools like neural networks. Such tools struggle to cope with the non-trivial challenges of the network environment and the growing demands of emerging use cases. In this paper, we revisit the concept of AI-native wireless systems, equipping them with the common sense necessary to transform them into artificial general intelligence (AGI)-native systems. These systems acquire common sense by exploiting different cognitive abilities such as perception, analogy, and reasoning, that enable them to generalize and deal with unforeseen scenarios. Towards developing the components of such a system, we start by showing how the perception module can be built through abstracting real-world elements into generalizable representations. These representations are then used to create a world model, founded on principles of causality and hyper-dimensional (HD) computing, that aligns with intuitive physics and enables analogical reasoning, that define common sense. Then, we explain how methods such as integrated information theory play a role in the proposed intent-driven and objective-driven planning methods that maneuver the AGI-native network to take actions. Next, we discuss how an AGI-native network can enable use cases related to human and autonomous agents: a) analogical reasoning for next-generation DTs, b) synchronized and resilient experiences for cognitive avatars, and c) brain-level metaverse experiences like holographic teleportation. Finally, we conclude with a set of recommendations to build AGI-native systems. Ultimately, we envision this paper as a roadmap for the beyond 6G era.
实现诸如数字孪生(DTs)等服务的未来无线系统具有挑战性,即使采用传统的技术如元表面也会如此。虽然人工智能(AI)原生网络承诺要克服无线技术的某些限制,但发展仍然依赖于像神经网络这样的AI工具。这些工具很难应对网络环境中的非简单挑战和新兴用例的增长需求。在本文中,我们重新回顾了AI原生无线系统的概念,并使它们具有将它们转化为具有人工通用智能(AGI)所需常识的设备。这些系统通过利用感知、类比和推理等不同认知能力来获取常识,并能够泛化处理未预料到的情况。为了开发这种系统,我们首先展示如何通过将现实世界的元素抽象成可通用表示来构建感知模块。然后,我们使用基于因果关系和超维度(HD)计算的原则创建了一个世界模型,符合直觉物理学,并能够进行类比推理,定义共同常识。接下来,我们解释了像集成信息理论这样的方法在拟定有意和目标驱动的计划方法中如何发挥作用,这些方法可以操纵AGI原生网络采取行动。然后,我们讨论了AGI原生网络如何实现与人类和自主代理相关的用例:a)下一代DT的类比推理;b)为认知角色实现同步和有弹性的体验;c)类似于全息传输的脑层虚拟现实体验。最后,我们得出了一系列建议,以构建AGI原生系统。我们最终将本文视为6G时代 beyond 的路线图。
https://arxiv.org/abs/2405.02336
Click-through rate (CTR) prediction plays an important role in personalized recommendations. Recently, sample-level retrieval-based models (e.g., RIM) have achieved remarkable performance by retrieving and aggregating relevant samples. However, their inefficiency at the inference stage makes them impractical for industrial applications. To overcome this issue, this paper proposes a universal plug-and-play Retrieval-Oriented Knowledge (ROK) framework. Specifically, a knowledge base, consisting of a retrieval-oriented embedding layer and a knowledge encoder, is designed to preserve and imitate the retrieved & aggregated representations in a decomposition-reconstruction paradigm. Knowledge distillation and contrastive learning methods are utilized to optimize the knowledge base, and the learned retrieval-enhanced representations can be integrated with arbitrary CTR models in both instance-wise and feature-wise manners. Extensive experiments on three large-scale datasets show that ROK achieves competitive performance with the retrieval-based CTR models while reserving superior inference efficiency and model compatibility.
点击率(CTR)预测在个性化推荐中扮演着重要角色。最近,基于样本级检索的模型(例如,RIM)通过检索和聚合相关样本取得了显著的性能。然而,他们在推理阶段的高效率使得它们对于工业应用不实用。为了克服这个问题,本文提出了一种通用的挂载即插即用检索导向知识(ROK)框架。具体来说,一个知识库,包括一个检索嵌入层和一个知识编码器,被设计成保留并模仿在分解重建范式中检索到的&聚合表示。使用知识蒸馏和对比学习方法来优化知识库,学到的检索增强表示可以以实例和特征的方式与任意CTR模型集成。在三个大型数据集上的实验表明,ROK在检索基于CTR的模型方面具有竞争力的性能,同时保留卓越的推理效率和模型兼容性。
https://arxiv.org/abs/2404.18304
Sequential recommendation is one of the important branches of recommender system, aiming to achieve personalized recommended items for the future through the analysis and prediction of users' ordered historical interactive behaviors. However, along with the growth of the user volume and the increasingly rich behavioral information, how to understand and disentangle the user's interactive multi-intention effectively also poses challenges to behavior prediction and sequential recommendation. In light of these challenges, we propose a Contrastive Learning sequential recommendation method based on Multi-Intention Disentanglement (MIDCL). In our work, intentions are recognized as dynamic and diverse, and user behaviors are often driven by current multi-intentions, which means that the model needs to not only mine the most relevant implicit intention for each user, but also impair the influence from irrelevant intentions. Therefore, we choose Variational Auto-Encoder (VAE) to realize the disentanglement of users' multi-intentions, and propose two types of contrastive learning paradigms for finding the most relevant user's interactive intention, and maximizing the mutual information of positive sample pairs, respectively. Experimental results show that MIDCL not only has significant superiority over most existing baseline methods, but also brings a more interpretable case to the research about intention-based prediction and recommendation.
序列推荐是推荐系统的一个重要分支,通过分析预测用户的历史交互行为,旨在为用户未来的个性化推荐。然而,随着用户量的增长和行为信息的日益丰富,如何有效地理解和区分用户的多种意图也带来了挑战,影响了行为预测和序列推荐。在这些挑战面前,我们提出了基于多意图区分(MIDCL)的对比学习序列推荐方法。在我们的工作中,意图被认为是动态和多样化的,用户的行为通常是由当前的多种意图驱动的,这意味着模型不仅需要为每个用户挖掘最相关的隐含意图,还需要削弱无关意图的影响。因此,我们选择变分自编码器(VAE)来实现用户多意图的区分,并分别提出两种对比学习范式,用于找到最相关的用户交互意图和最大化正样本对之间的互信息。实验结果表明,MIDCL不仅在大多数现有基线方法中具有显著的优越性,而且为意图基于预测和推荐的研究带来了更可解释的案例。
https://arxiv.org/abs/2404.18214
Numerous industries have benefited from the use of machine learning and fashion in industry is no exception. By gaining a better understanding of what makes a good outfit, companies can provide useful product recommendations to their users. In this project, we follow two existing approaches that employ graphs to represent outfits and use modified versions of the Graph neural network (GNN) frameworks. Both Node-wise Graph Neural Network (NGNN) and Hypergraph Neural Network aim to score a set of items according to the outfit compatibility of items. The data used is the Polyvore Dataset which consists of curated outfits with product images and text descriptions for each product in an outfit. We recreate the analysis on a subset of this data and compare the two existing models on their performance on two tasks Fill in the blank (FITB): finding an item that completes an outfit, and Compatibility prediction: estimating compatibility of different items grouped as an outfit. We can replicate the results directionally and find that HGNN does have a slightly better performance on both tasks. On top of replicating the results of the two papers we also tried to use embeddings generated from a vision transformer and witness enhanced prediction accuracy across the board
许多行业都从机器学习和时尚中受益,时尚行业也不例外。通过更好地理解什么是好的服装,公司可以为用户提供有用的产品推荐。在这个项目中,我们遵循了两种现有的方法,即节点图神经网络(NGNN)和超图神经网络,这些方法使用图形来表示服装,并使用对 Graph神经网络(GNN)框架的修改版本。节点图神经网络(NGNN)和超图神经网络的目标是根据服装的兼容性对一组项目进行评分。所使用的数据是Polyvore数据集,它包括精心挑选的服装和每个服装中产品的图片和文字描述。我们在数据子集上重新分析,并比较这两个现有模型的性能在两个任务上:Fill in the blank(FITB):找到完成套路的物品,Compatibility prediction:估计将不同物品分组为套路的兼容性。我们可以沿袭两个论文的结果,并发现HGNN在两个任务上都表现得更好。除了复制两个论文的结果外,我们还试图尝试使用从视觉 transformer生成的嵌入,并全面提高在整个 board上的预测准确性。
https://arxiv.org/abs/2404.18040
Perceptions of hate can vary greatly across cultural contexts. Hate speech (HS) datasets, however, have traditionally been developed by language. This hides potential cultural biases, as one language may be spoken in different countries home to different cultures. In this work, we evaluate cultural bias in HS datasets by leveraging two interrelated cultural proxies: language and geography. We conduct a systematic survey of HS datasets in eight languages and confirm past findings on their English-language bias, but also show that this bias has been steadily decreasing in the past few years. For three geographically-widespread languages -- English, Arabic and Spanish -- we then leverage geographical metadata from tweets to approximate geo-cultural contexts by pairing language and country information. We find that HS datasets for these languages exhibit a strong geo-cultural bias, largely overrepresenting a handful of countries (e.g., US and UK for English) relative to their prominence in both the broader social media population and the general population speaking these languages. Based on these findings, we formulate recommendations for the creation of future HS datasets.
不同文化背景下对仇恨的看法可能会有很大差异。仇恨言论(HS)数据集通常是由语言开发而来的,这可能掩盖了潜在的文化偏见,因为一种语言可能会在不同的国家使用,而这些国家可能具有不同的文化。在这项工作中,我们通过利用两个相互关联的文化指标:语言和地理位置,对HS数据集的文化偏见进行评估。我们在8种语言的HS数据集上进行了一项系统性的调查,证实了它们在英语语言偏见方面的已有发现,但还表明,这种偏见在过去几年里稳步减少。对于英语、阿拉伯语和西班牙语等三个地理上广泛传播的语言,我们利用推特中的地理元数据来近似地理文化背景,将语言和国家信息进行匹配。我们发现,这些语言的HS数据集表现出强烈的地理文化偏见,主要过度代表了它们在更广泛的社交媒体用户和这些语言的一般人口中的突出地位(例如,美国和英国为英语)。基于这些发现,我们为未来HS数据集的创建提出了建议。
https://arxiv.org/abs/2404.17874
Multi-modal Emotion Recognition in Conversation (MERC) has received considerable attention in various fields, e.g., human-computer interaction and recommendation systems. Most existing works perform feature disentanglement and fusion to extract emotional contextual information from multi-modal features and emotion classification. After revisiting the characteristic of MERC, we argue that long-range contextual semantic information should be extracted in the feature disentanglement stage and the inter-modal semantic information consistency should be maximized in the feature fusion stage. Inspired by recent State Space Models (SSMs), Mamba can efficiently model long-distance dependencies. Therefore, in this work, we fully consider the above insights to further improve the performance of MERC. Specifically, on the one hand, in the feature disentanglement stage, we propose a Broad Mamba, which does not rely on a self-attention mechanism for sequence modeling, but uses state space models to compress emotional representation, and utilizes broad learning systems to explore the potential data distribution in broad space. Different from previous SSMs, we design a bidirectional SSM convolution to extract global context information. On the other hand, we design a multi-modal fusion strategy based on probability guidance to maximize the consistency of information between modalities. Experimental results show that the proposed method can overcome the computational and memory limitations of Transformer when modeling long-distance contexts, and has great potential to become a next-generation general architecture in MERC.
多模态情感识别(MERC)在诸如人机交互和推荐系统等领域得到了广泛关注。大多数现有作品通过特征解离和融合来提取多模态特征和情感分类所需的情感上下文信息。回顾MERC的特点后,我们提出,在特征解离阶段应该提取长距离上下文语义信息,而在特征融合阶段应该最大化跨模态语义信息的一致性。受到最近的状态空间模型(SSMs)的启发,Mamba可以有效地建模长距离依赖关系。因此,在本文中,我们完全考虑了上述见解,以进一步改进MERC的性能。 具体来说,在特征解离阶段,我们提出了一种Broad Mamba,它不依赖于序列建模的自注意力机制,而是使用状态空间模型压缩情感表示,并利用Broad学习系统在宽空间中探索潜在数据分布。与以前的SSMs不同,我们设计了一种双向SSM卷积以提取全局上下文信息。另一方面,我们设计了一种基于概率指导的多模态融合策略,以最大化模态之间的信息一致性。 实验结果表明,与Transformer模型相比,所提出的方法在建模长距离上下文时可以克服计算和内存限制,具有很大的潜力成为MERC的下一代通用架构。
https://arxiv.org/abs/2404.17858
The integration of Artificial Intelligence (AI) in healthcare presents a transformative potential for enhancing operational efficiency and health outcomes. Large Language Models (LLMs), such as ChatGPT, have shown their capabilities in supporting medical decision-making. Embedding LLMs in medical systems is becoming a promising trend in healthcare development. The potential of ChatGPT to address the triage problem in emergency departments has been examined, while few studies have explored its application in outpatient departments. With a focus on streamlining workflows and enhancing efficiency for outpatient triage, this study specifically aims to evaluate the consistency of responses provided by ChatGPT in outpatient guidance, including both within-version response analysis and between-version comparisons. For within-version, the results indicate that the internal response consistency for ChatGPT-4.0 is significantly higher than ChatGPT-3.5 (p=0.03) and both have a moderate consistency (71.2% for 4.0 and 59.6% for 3.5) in their top recommendation. However, the between-version consistency is relatively low (mean consistency score=1.43/3, median=1), indicating few recommendations match between the two versions. Also, only 50% top recommendations match perfectly in the comparisons. Interestingly, ChatGPT-3.5 responses are more likely to be complete than those from ChatGPT-4.0 (p=0.02), suggesting possible differences in information processing and response generation between the two versions. The findings offer insights into AI-assisted outpatient operations, while also facilitating the exploration of potentials and limitations of LLMs in healthcare utilization. Future research may focus on carefully optimizing LLMs and AI integration in healthcare systems based on ergonomic and human factors principles, precisely aligning with the specific needs of effective outpatient triage.
人工智能(AI)在医疗领域的应用具有提高运营效率和健康状况的变革潜力。大型语言模型(LLMs)如ChatGPT,已经在支持医疗决策方面展现出其能力。将LLMs嵌入医疗系统已成为医疗发展中的一个有前景的趋势。本文重点探讨了ChatGPT在急诊科分诊方面的应用潜力,而很少有研究探讨其在门诊部的应用。本文旨在评估ChatGPT在门诊指导中的回答一致性,包括内翻响应分析和跨版本比较。在內翻响应方面,结果显示ChatGPT-4.0的内部响应一致性显著高于ChatGPT-3.5(p=0.03),且两者在最高建议方面具有相似的稳健性(40.8% for 4.0 and 59.6% for 3.5)。然而,跨版本一致性相对较低(平均一致性分数=1.43/3,中位数=1),表明两个版本之间很少匹配。此外,只有50%的顶级建议在比较中完全匹配。有趣的是,ChatGPT-3.5的回答更有可能完整性较高(p=0.02),这可能表明两个版本之间在信息处理和响应生成方面的差异。这些发现为人工智能辅助医疗操作提供了洞见,同时也为探讨LLM在医疗利用中的潜力和局限性提供了便利。未来的研究可能会根据人机工程和人类因素原则,仔细优化LLMs和AI在医疗系统中的应用,并精确地把握有效门诊分诊的具体需求。
https://arxiv.org/abs/2405.00728
Named entity recognition (NER) is a fundamental task in natural language processing that involves identifying and classifying entities in sentences into pre-defined types. It plays a crucial role in various research fields, including entity linking, question answering, and online product recommendation. Recent studies have shown that incorporating multilingual and multimodal datasets can enhance the effectiveness of NER. This is due to language transfer learning and the presence of shared implicit features across different modalities. However, the lack of a dataset that combines multilingualism and multimodality has hindered research exploring the combination of these two aspects, as multimodality can help NER in multiple languages simultaneously. In this paper, we aim to address a more challenging task: multilingual and multimodal named entity recognition (MMNER), considering its potential value and influence. Specifically, we construct a large-scale MMNER dataset with four languages (English, French, German and Spanish) and two modalities (text and image). To tackle this challenging MMNER task on the dataset, we introduce a new model called 2M-NER, which aligns the text and image representations using contrastive learning and integrates a multimodal collaboration module to effectively depict the interactions between the two modalities. Extensive experimental results demonstrate that our model achieves the highest F1 score in multilingual and multimodal NER tasks compared to some comparative and representative baselines. Additionally, in a challenging analysis, we discovered that sentence-level alignment interferes a lot with NER models, indicating the higher level of difficulty in our dataset.
命名实体识别(NER)是自然语言处理中的一个基本任务,涉及将句子中的实体识别并分类到预定义的类型中。它在各种研究领域中都扮演着关键角色,包括实体链接、问答和在线产品推荐。近年来,研究表明,纳入多语言和多模态数据集可以增强NER的有效性。这是由于语言迁移学习和不同模态之间共享隐含特征的结果。然而,缺乏一个结合多语言性和多模态性的数据集限制了研究探索这两个方面的结合,因为多模态可以帮助NER在多种语言上同时进行识别。在本文中,我们旨在解决一个更具挑战性的任务:多语言和多模态命名实体识别(MMNER),考虑其潜力和影响。具体来说,我们构建了一个大规模MMNER数据集(包括英语、法语、德语和西班牙语)和两种模式(文本和图像)。为了在数据集上解决这个具有挑战性的MMNER任务,我们引入了一个名为2M-NER的新模型,它通过对比学习将文本和图像表示对齐,并集成了一个多模态合作模块,有效地描绘了两种模式之间的相互作用。大量的实验结果表明,与一些比较年和代表性的基线相比,我们的模型在多语言和多模态NER任务中获得了最高的F1分数。此外,在具有挑战性的分析中,我们发现句子级别对齐极大地影响了NER模型,这表明在我们的数据集中,困难程度更高。
https://arxiv.org/abs/2404.17122
With the information explosion on the Web, search and recommendation are foundational infrastructures to satisfying users' information needs. As the two sides of the same coin, both revolve around the same core research problem, matching queries with documents or users with items. In the recent few decades, search and recommendation have experienced synchronous technological paradigm shifts, including machine learning-based and deep learning-based paradigms. Recently, the superintelligent generative large language models have sparked a new paradigm in search and recommendation, i.e., generative search (retrieval) and recommendation, which aims to address the matching problem in a generative manner. In this paper, we provide a comprehensive survey of the emerging paradigm in information systems and summarize the developments in generative search and recommendation from a unified perspective. Rather than simply categorizing existing works, we abstract a unified framework for the generative paradigm and break down the existing works into different stages within this framework to highlight the strengths and weaknesses. And then, we distinguish generative search and recommendation with their unique challenges, identify open problems and future directions, and envision the next information-seeking paradigm.
在网络信息爆炸的情况下,搜索和推荐是满足用户信息需求的基石基础设施。这两者是硬币的两面,都围绕同一个核心研究问题展开,将查询与文档或用户与物品匹配。在过去的几十年里,搜索和推荐经历了技术范式的同步转变,包括基于机器学习和深度学习的范式。近年来,超级智能生成式大型语言模型在搜索和推荐领域引发了一个新的范式,即生成搜索(检索)和推荐,旨在以生成方式解决匹配问题。在本文中,我们对信息系统的新兴范式进行全面调查,并从统一的角度总结了生成搜索和推荐的 developments。 我们不是简单地分类现有作品,而是抽象出通用的生成范式框架,并在这个框架内分解现有作品的不同阶段,以突出其优势和不足。然后,我们区分生成搜索和推荐及其独特的挑战,确定 open problems 和未来的研究方向,并展望下一个信息寻求范式。
https://arxiv.org/abs/2404.16924
The adoption of large language models (LLMs) to assist clinicians has attracted remarkable attention. Existing works mainly adopt the close-ended question-answering task with answer options for evaluation. However, in real clinical settings, many clinical decisions, such as treatment recommendations, involve answering open-ended questions without pre-set options. Meanwhile, existing studies mainly use accuracy to assess model performance. In this paper, we comprehensively benchmark diverse LLMs in healthcare, to clearly understand their strengths and weaknesses. Our benchmark contains seven tasks and thirteen datasets across medical language generation, understanding, and reasoning. We conduct a detailed evaluation of the existing sixteen LLMs in healthcare under both zero-shot and few-shot (i.e., 1,3,5-shot) learning settings. We report the results on five metrics (i.e. matching, faithfulness, comprehensiveness, generalizability, and robustness) that are critical in achieving trust from clinical users. We further invite medical experts to conduct human evaluation.
大语言模型(LLMs)用于协助临床医生的应用引起了相当的关注。现有的作品主要采用答案选项为评估的关闭式问题回答任务。然而,在实际临床环境中,许多临床决策,如治疗建议,涉及回答没有预设选项的开放性问题。同时,现有的研究主要使用准确性评估模型的性能。在本文中,我们全面评估了医疗保健领域中的各种LLM,以清晰地了解它们的优缺点。我们的基准包含医疗语言生成、理解和推理七个任务以及医疗语言数据集。我们在零散和少量(即1,3,5- shot)学习设置下对现有的16个LLM进行了详细评估。我们报告了五个对实现临床用户信任至关重要的指标(即匹配、忠实地、全面性、可推广性和鲁棒性)的结果。我们进一步邀请医学专家进行人评。
https://arxiv.org/abs/2405.00716
Document-level Relation Extraction (DocRE) is the task of extracting all semantic relationships from a document. While studies have been conducted on English DocRE, limited attention has been given to DocRE in non-English languages. This work delves into effectively utilizing existing English resources to promote DocRE studies in non-English languages, with Japanese as the representative case. As an initial attempt, we construct a dataset by transferring an English dataset to Japanese. However, models trained on such a dataset suffer from low recalls. We investigate the error cases and attribute the failure to different surface structures and semantics of documents translated from English and those written by native speakers. We thus switch to explore if the transferred dataset can assist human annotation on Japanese documents. In our proposal, annotators edit relation predictions from a model trained on the transferred dataset. Quantitative analysis shows that relation recommendations suggested by the model help reduce approximately 50% of the human edit steps compared with the previous approach. Experiments quantify the performance of existing DocRE models on our collected dataset, portraying the challenges of Japanese and cross-lingual DocRE.
文档级别关系提取(DocRE)是从文档中提取所有语义关系的过程。尽管已经进行了关于英语DocRE的研究,但在非英语语言中,对DocRE的研究却鲜有关注。本文深入研究如何有效地利用现有英语资源来促进非英语语言中的DocRE研究,以日本为例作为代表。作为初始尝试,我们将英语数据集迁移到日本并构建了一个数据集。然而,训练在这样的数据集上的模型,模型的召回率很低。我们研究了错误案例,并将失败归因于从英语到非英语翻译的文档的不同表面结构和语义。因此,我们转向研究是否转移的数据集可以帮助人类对日语文档进行标注。在我们的建议中,注释者编辑从转移数据集中得出的关系预测。定量分析显示,与以前的方法相比,模型建议的关系减少约50%的人为编辑步骤。实验验证了现有DocRE模型的在我们收集的数据集上的性能,揭示了日语和跨语言DocRE的挑战。
https://arxiv.org/abs/2404.16506
Managing the semantic quality of the categorization in large textual datasets, such as Wikipedia, presents significant challenges in terms of complexity and cost. In this paper, we propose leveraging transformer models to distill semantic information from texts in the Wikipedia dataset and its associated categories into a latent space. We then explore different approaches based on these encodings to assess and enhance the semantic identity of the categories. Our graphical approach is powered by Convex Hull, while we utilize Hierarchical Navigable Small Worlds (HNSWs) for the hierarchical approach. As a solution to the information loss caused by the dimensionality reduction, we modulate the following mathematical solution: an exponential decay function driven by the Euclidean distances between the high-dimensional encodings of the textual categories. This function represents a filter built around a contextual category and retrieves items with a certain Reconsideration Probability (RP). Retrieving high-RP items serves as a tool for database administrators to improve data groupings by providing recommendations and identifying outliers within a contextual framework.
管理大型文本数据集中的分类语义质量是一项具有复杂性和成本挑战性的任务。在本文中,我们提出利用Transformer模型从维基百科数据集中的文本和相关的类别中提取语义信息,并将其转换为潜在空间。然后,我们探讨了基于这些编码的不同方法,以评估和增强类别的语义身份。我们的图形方法基于Convex Hull,而我们在Hierarchical Navigable Small Worlds (HNSWs)中使用分层方法。作为一种解决由于维度降低引起的信息损失的方法,我们调节以下数学解:由Euclidean距离驱动的指数衰减函数。这个函数围绕一个上下文类别构建一个滤波器,并检索具有特定重新考虑概率(RP)的项。检索高RP项目是一种数据库管理员通过提供建议和改进数据分组的方法。通过在上下文框架内识别异常值,这种工具可以帮助管理员优化数据分组。
https://arxiv.org/abs/2404.16442
Fuzzing, a widely-used technique for bug detection, has seen advancements through Large Language Models (LLMs). Despite their potential, LLMs face specific challenges in fuzzing. In this paper, we identified five major challenges of LLM-assisted fuzzing. To support our findings, we revisited the most recent papers from top-tier conferences, confirming that these challenges are widespread. As a remedy, we propose some actionable recommendations to help improve applying LLM in Fuzzing and conduct preliminary evaluations on DBMS fuzzing. The results demonstrate that our recommendations effectively address the identified challenges.
模糊测试(Fuzzing)是一种广泛使用的代码审计技术,它通过大型语言模型(LLMs)取得了进展。尽管LLMs具有巨大的潜力,但它们在模糊测试方面面临一些特定的挑战。在本文中,我们确定了LLM辅助模糊测试的五个主要挑战。为了支持我们的发现,我们回顾了顶级会议中最新的论文,证实了这些挑战是普遍存在的。为了改善在模糊测试中应用LLM,我们提出了一些可行的建议,并对DBMS模糊测试进行了初步评估。结果显示,我们的建议有效地解决了识别出的挑战。
https://arxiv.org/abs/2404.16297
In contrast to pedagogies like evidence-based teaching, personalized adaptive learning (PAL) distinguishes itself by closely monitoring the progress of individual students and tailoring the learning path to their unique knowledge and requirements. A crucial technique for effective PAL implementation is knowledge tracing, which models students' evolving knowledge to predict their future performance. Based on these predictions, personalized recommendations for resources and learning paths can be made to meet individual needs. Recent advancements in deep learning have successfully enhanced knowledge tracking through Deep Knowledge Tracing (DKT). This paper introduces generative AI models to further enhance DKT. Generative AI models, rooted in deep learning, are trained to generate synthetic data, addressing data scarcity challenges in various applications across fields such as natural language processing (NLP) and computer vision (CV). This study aims to tackle data shortage issues in student learning records to enhance DKT performance for PAL. Specifically, it employs TabDDPM, a diffusion model, to generate synthetic educational records to augment training data for enhancing DKT. The proposed method's effectiveness is validated through extensive experiments on ASSISTments datasets. The experimental results demonstrate that the AI-generated data by TabDDPM significantly improves DKT performance, particularly in scenarios with small data for training and large data for testing.
相比之下,基于证据的教学方法等教育方法,自适应个性化学习(PAL)通过密切监控每个学生的进步,将学习路径定制为他们独特的知识和需求而脱颖而出。在有效实施PAL的关键技术是知识追踪,它将学生的不断发展的知识建模为预测未来的表现。根据这些预测,可以针对个人需求定制资源和学习路径。在深度学习方面的最新进展通过深度知识追踪(DKT)成功地增强了知识追踪。本文介绍了一些生成人工智能(生成式AI)模型,以进一步增强DKT。基于深度学习的生成式AI模型被训练生成合成数据,解决各种应用领域中数据稀缺的问题,如自然语言处理(NLP)和计算机视觉(CV)。本研究旨在通过解决学生学习记录中的数据稀缺问题来提高PAL对DKT的性能。具体来说,它采用TabDDPM(扩散模型)生成合成教育记录来补充训练数据,以增强DKT。该方法的有效性通过ASSSI-T数据集的广泛实验来验证。实验结果表明,由TabDDPM生成的数据在训练和测试场景中的效果都显著提高DKT性能,尤其是在训练数据较小,测试数据较大的情况下。
https://arxiv.org/abs/2405.05134
Knowledge Graphs (KGs) are widely employed in artificial intelligence applications, such as question-answering and recommendation systems. However, KGs are frequently found to be incomplete. While much of the existing literature focuses on predicting missing nodes for given incomplete KG triples, there remains an opportunity to complete KGs by exploring relations between existing nodes, a task known as relation prediction. In this study, we propose a relations prediction model that harnesses both textual and structural information within KGs. Our approach integrates walks-based embeddings with language model embeddings to effectively represent nodes. We demonstrate that our model achieves competitive results in the relation prediction task when evaluated on a widely used dataset.
知识图(KGs)广泛应用于人工智能领域,如问答和推荐系统。然而,KGs经常被发现不完整。虽然现有文献主要关注预测给定不完整的KG三元组中的缺失节点,但在关系预测领域仍有机会通过探索现有节点之间的关系来完成KGs,实现名为关系预测的任务。在这项研究中,我们提出了一个利用KGs中文本和结构信息的关联预测模型。我们的方法结合了走行嵌入和语言模型嵌入,有效地表示节点。我们证明了,当在我们的广泛使用数据集上评估时,我们的模型在关系预测任务上实现了具有竞争力的结果。
https://arxiv.org/abs/2404.16206
Considering the premise that the number of products offered grow in an exponential fashion and the amount of data that a user can assimilate before making a decision is relatively small, recommender systems help in categorizing content according to user preferences. Collaborative filtering is a widely used method for computing recommendations due to its good performance. But, this method makes the system vulnerable to attacks which try to bias the recommendations. These attacks, known as 'shilling attacks' are performed to push an item or nuke an item in the system. This paper proposes an algorithm to detect such shilling profiles in the system accurately and also study the effects of such profiles on the recommendations.
考虑到产品数量以指数方式增长,用户在做出决定之前可以吸收的数据量相对较小,因此推荐系统有助于根据用户喜好对内容进行分类。协同过滤是一种广泛使用的计算推荐的方法,因为它性能良好。但是,这种方法使得系统容易受到试图影响推荐结果的攻击,这些攻击被称为“推销攻击”。这些攻击旨在推动系统中的某个项目或彻底破坏它。本文提出了一种准确检测系统中“推销实例”的算法,并研究了这些实例对推荐的影响。
https://arxiv.org/abs/2404.16177
Sequence modeling is a crucial area across various domains, including Natural Language Processing (NLP), speech recognition, time series forecasting, music generation, and bioinformatics. Recurrent Neural Networks (RNNs) and Long Short Term Memory Networks (LSTMs) have historically dominated sequence modeling tasks like Machine Translation, Named Entity Recognition (NER), etc. However, the advancement of transformers has led to a shift in this paradigm, given their superior performance. Yet, transformers suffer from $O(N^2)$ attention complexity and challenges in handling inductive bias. Several variations have been proposed to address these issues which use spectral networks or convolutions and have performed well on a range of tasks. However, they still have difficulty in dealing with long sequences. State Space Models(SSMs) have emerged as promising alternatives for sequence modeling paradigms in this context, especially with the advent of S4 and its variants, such as S4nd, Hippo, Hyena, Diagnol State Spaces (DSS), Gated State Spaces (GSS), Linear Recurrent Unit (LRU), Liquid-S4, Mamba, etc. In this survey, we categorize the foundational SSMs based on three paradigms namely, Gating architectures, Structural architectures, and Recurrent architectures. This survey also highlights diverse applications of SSMs across domains such as vision, video, audio, speech, language (especially long sequence modeling), medical (including genomics), chemical (like drug design), recommendation systems, and time series analysis, including tabular data. Moreover, we consolidate the performance of SSMs on benchmark datasets like Long Range Arena (LRA), WikiText, Glue, Pile, ImageNet, Kinetics-400, sstv2, as well as video datasets such as Breakfast, COIN, LVU, and various time series datasets. The project page for Mamba-360 work is available on this webpage.\url{this https URL}.
序列建模是一个贯穿各种领域的关键领域,包括自然语言处理(NLP)、语音识别、时间序列预测、音乐生成和生物信息学。递归神经网络(RNNs)和长短时记忆网络(LSTMs)历史上曾统治序列建模任务,如机器翻译、命名实体识别等。然而,Transformer的进步导致了一种范式的转移,由于它们在性能上的优越表现。然而,Transformer的注意力复杂性和处理归纳偏差的能力仍然存在挑战。为解决这些问题,已经提出了几种变体,包括使用特征网络或卷积的模型,并在各种任务上表现良好。然而,它们仍然很难处理长序列。状态空间模型(SSMs)在这一背景下出现了有前景的替代方案,尤其是S4和其变体,如S4nd、Hippo、Hyena、诊断状态空间(DSS)、Gated State Spaces(GSS)和Linear Recurrent Unit(LRU)、Liquid-S4、Mamba等。在本次调查中,我们根据三种范式对基本SSMs进行了分类,即开关架构、结构架构和循环架构。本调查还强调了SSMs在各个领域的多样化应用,如视觉、视频、音频、语音、语言(特别是长序列建模)、医学(包括基因组学)、化学(如药物设计)和推荐系统,以及时间序列分析,包括表格数据。此外,我们还分析了SSMs在基准数据集,如Long Range Arena(LRA)、WikiText、Glue、Pile、ImageNet、Kinetics-400、sstv2,以及视频数据集,如Breakfast、COIN、LVU等。Mamba-360工作的项目页面可以在该网页上查看。
https://arxiv.org/abs/2404.16112
In current recommendation systems, temporal data shift poses a significant challenge. The presence of data shift prevents the system from simply enhancing the CTR model's adaptability to new data by adding more training data. We observed that although the correlation between features and labels in recommendation systems changes over time, if a fixed search space is established, the relationship between the data and the search space remains invariant. Therefore, we designed a framework that uses retrieval techniques to leverage shifting data for training a relevance network. However, due to the use of BM25 as a retrieval method, this framework is challenging to deploy in online recommendation systems. We then designed a distillation method using knowledge distillation to transfer knowledge from the relevance network to a parameterized module, the search-distill module. We refer to this entire process as the Retrieval and Distill paradigm (RAD). With the RAD paradigm, we have an effective method for leveraging shifting data to enhance the performance of CTR models. In future research directions, we aim to incorporate a wider variety of data into the CTR model using RAD. On the other hand, enhancing the performance of the distillation method is also a significant area of focus.
在当前的推荐系统中,时间数据变化是一个重大的挑战。数据的变化阻止了系统通过添加更多训练数据来简单地提高CTR模型的适应性。我们观察到,尽管推荐系统中的特征与标签之间的相关性会随着时间的变化而变化,但只要建立了一个固定的搜索空间,数据与搜索空间之间的关系就会保持不变。因此,我们设计了一个使用检索技术利用 shifting 数据来训练相关网络的框架。然而,由于使用 BM25 作为检索方法,这个框架在在线推荐系统中很难部署。然后,我们使用知识蒸馏技术设计了一个馏方法,将来自相关网络的知识传递给参数化模块,即搜索-蒸馏模块。我们将这个整个过程称为检索和蒸馏范式(RAD)。通过 RAD 范式,我们有一种有效的利用移动物品增强CTR模型性能的方法。在未来的研究方向中,我们旨在通过 RAD 将更广泛的数据集成到CTR模型中。另一方面,提高蒸馏方法的效果也是一个重要的关注点。
https://arxiv.org/abs/2404.15678