Document-level Relation Extraction (DocRE) is the task of extracting all semantic relationships from a document. While studies have been conducted on English DocRE, limited attention has been given to DocRE in non-English languages. This work delves into effectively utilizing existing English resources to promote DocRE studies in non-English languages, with Japanese as the representative case. As an initial attempt, we construct a dataset by transferring an English dataset to Japanese. However, models trained on such a dataset suffer from low recalls. We investigate the error cases and attribute the failure to different surface structures and semantics of documents translated from English and those written by native speakers. We thus switch to explore if the transferred dataset can assist human annotation on Japanese documents. In our proposal, annotators edit relation predictions from a model trained on the transferred dataset. Quantitative analysis shows that relation recommendations suggested by the model help reduce approximately 50% of the human edit steps compared with the previous approach. Experiments quantify the performance of existing DocRE models on our collected dataset, portraying the challenges of Japanese and cross-lingual DocRE.
文档级别关系提取(DocRE)是从文档中提取所有语义关系的过程。尽管已经进行了关于英语DocRE的研究,但在非英语语言中,对DocRE的研究却鲜有关注。本文深入研究如何有效地利用现有英语资源来促进非英语语言中的DocRE研究,以日本为例作为代表。作为初始尝试,我们将英语数据集迁移到日本并构建了一个数据集。然而,训练在这样的数据集上的模型,模型的召回率很低。我们研究了错误案例,并将失败归因于从英语到非英语翻译的文档的不同表面结构和语义。因此,我们转向研究是否转移的数据集可以帮助人类对日语文档进行标注。在我们的建议中,注释者编辑从转移数据集中得出的关系预测。定量分析显示,与以前的方法相比,模型建议的关系减少约50%的人为编辑步骤。实验验证了现有DocRE模型的在我们收集的数据集上的性能,揭示了日语和跨语言DocRE的挑战。
https://arxiv.org/abs/2404.16506
Managing the semantic quality of the categorization in large textual datasets, such as Wikipedia, presents significant challenges in terms of complexity and cost. In this paper, we propose leveraging transformer models to distill semantic information from texts in the Wikipedia dataset and its associated categories into a latent space. We then explore different approaches based on these encodings to assess and enhance the semantic identity of the categories. Our graphical approach is powered by Convex Hull, while we utilize Hierarchical Navigable Small Worlds (HNSWs) for the hierarchical approach. As a solution to the information loss caused by the dimensionality reduction, we modulate the following mathematical solution: an exponential decay function driven by the Euclidean distances between the high-dimensional encodings of the textual categories. This function represents a filter built around a contextual category and retrieves items with a certain Reconsideration Probability (RP). Retrieving high-RP items serves as a tool for database administrators to improve data groupings by providing recommendations and identifying outliers within a contextual framework.
管理大型文本数据集中的分类语义质量是一项具有复杂性和成本挑战性的任务。在本文中,我们提出利用Transformer模型从维基百科数据集中的文本和相关的类别中提取语义信息,并将其转换为潜在空间。然后,我们探讨了基于这些编码的不同方法,以评估和增强类别的语义身份。我们的图形方法基于Convex Hull,而我们在Hierarchical Navigable Small Worlds (HNSWs)中使用分层方法。作为一种解决由于维度降低引起的信息损失的方法,我们调节以下数学解:由Euclidean距离驱动的指数衰减函数。这个函数围绕一个上下文类别构建一个滤波器,并检索具有特定重新考虑概率(RP)的项。检索高RP项目是一种数据库管理员通过提供建议和改进数据分组的方法。通过在上下文框架内识别异常值,这种工具可以帮助管理员优化数据分组。
https://arxiv.org/abs/2404.16442
Fuzzing, a widely-used technique for bug detection, has seen advancements through Large Language Models (LLMs). Despite their potential, LLMs face specific challenges in fuzzing. In this paper, we identified five major challenges of LLM-assisted fuzzing. To support our findings, we revisited the most recent papers from top-tier conferences, confirming that these challenges are widespread. As a remedy, we propose some actionable recommendations to help improve applying LLM in Fuzzing and conduct preliminary evaluations on DBMS fuzzing. The results demonstrate that our recommendations effectively address the identified challenges.
模糊测试(Fuzzing)是一种广泛使用的代码审计技术,它通过大型语言模型(LLMs)取得了进展。尽管LLMs具有巨大的潜力,但它们在模糊测试方面面临一些特定的挑战。在本文中,我们确定了LLM辅助模糊测试的五个主要挑战。为了支持我们的发现,我们回顾了顶级会议中最新的论文,证实了这些挑战是普遍存在的。为了改善在模糊测试中应用LLM,我们提出了一些可行的建议,并对DBMS模糊测试进行了初步评估。结果显示,我们的建议有效地解决了识别出的挑战。
https://arxiv.org/abs/2404.16297
Knowledge Graphs (KGs) are widely employed in artificial intelligence applications, such as question-answering and recommendation systems. However, KGs are frequently found to be incomplete. While much of the existing literature focuses on predicting missing nodes for given incomplete KG triples, there remains an opportunity to complete KGs by exploring relations between existing nodes, a task known as relation prediction. In this study, we propose a relations prediction model that harnesses both textual and structural information within KGs. Our approach integrates walks-based embeddings with language model embeddings to effectively represent nodes. We demonstrate that our model achieves competitive results in the relation prediction task when evaluated on a widely used dataset.
知识图(KGs)广泛应用于人工智能领域,如问答和推荐系统。然而,KGs经常被发现不完整。虽然现有文献主要关注预测给定不完整的KG三元组中的缺失节点,但在关系预测领域仍有机会通过探索现有节点之间的关系来完成KGs,实现名为关系预测的任务。在这项研究中,我们提出了一个利用KGs中文本和结构信息的关联预测模型。我们的方法结合了走行嵌入和语言模型嵌入,有效地表示节点。我们证明了,当在我们的广泛使用数据集上评估时,我们的模型在关系预测任务上实现了具有竞争力的结果。
https://arxiv.org/abs/2404.16206
Considering the premise that the number of products offered grow in an exponential fashion and the amount of data that a user can assimilate before making a decision is relatively small, recommender systems help in categorizing content according to user preferences. Collaborative filtering is a widely used method for computing recommendations due to its good performance. But, this method makes the system vulnerable to attacks which try to bias the recommendations. These attacks, known as 'shilling attacks' are performed to push an item or nuke an item in the system. This paper proposes an algorithm to detect such shilling profiles in the system accurately and also study the effects of such profiles on the recommendations.
考虑到产品数量以指数方式增长,用户在做出决定之前可以吸收的数据量相对较小,因此推荐系统有助于根据用户喜好对内容进行分类。协同过滤是一种广泛使用的计算推荐的方法,因为它性能良好。但是,这种方法使得系统容易受到试图影响推荐结果的攻击,这些攻击被称为“推销攻击”。这些攻击旨在推动系统中的某个项目或彻底破坏它。本文提出了一种准确检测系统中“推销实例”的算法,并研究了这些实例对推荐的影响。
https://arxiv.org/abs/2404.16177
Sequence modeling is a crucial area across various domains, including Natural Language Processing (NLP), speech recognition, time series forecasting, music generation, and bioinformatics. Recurrent Neural Networks (RNNs) and Long Short Term Memory Networks (LSTMs) have historically dominated sequence modeling tasks like Machine Translation, Named Entity Recognition (NER), etc. However, the advancement of transformers has led to a shift in this paradigm, given their superior performance. Yet, transformers suffer from $O(N^2)$ attention complexity and challenges in handling inductive bias. Several variations have been proposed to address these issues which use spectral networks or convolutions and have performed well on a range of tasks. However, they still have difficulty in dealing with long sequences. State Space Models(SSMs) have emerged as promising alternatives for sequence modeling paradigms in this context, especially with the advent of S4 and its variants, such as S4nd, Hippo, Hyena, Diagnol State Spaces (DSS), Gated State Spaces (GSS), Linear Recurrent Unit (LRU), Liquid-S4, Mamba, etc. In this survey, we categorize the foundational SSMs based on three paradigms namely, Gating architectures, Structural architectures, and Recurrent architectures. This survey also highlights diverse applications of SSMs across domains such as vision, video, audio, speech, language (especially long sequence modeling), medical (including genomics), chemical (like drug design), recommendation systems, and time series analysis, including tabular data. Moreover, we consolidate the performance of SSMs on benchmark datasets like Long Range Arena (LRA), WikiText, Glue, Pile, ImageNet, Kinetics-400, sstv2, as well as video datasets such as Breakfast, COIN, LVU, and various time series datasets. The project page for Mamba-360 work is available on this webpage.\url{this https URL}.
序列建模是一个贯穿各种领域的关键领域,包括自然语言处理(NLP)、语音识别、时间序列预测、音乐生成和生物信息学。递归神经网络(RNNs)和长短时记忆网络(LSTMs)历史上曾统治序列建模任务,如机器翻译、命名实体识别等。然而,Transformer的进步导致了一种范式的转移,由于它们在性能上的优越表现。然而,Transformer的注意力复杂性和处理归纳偏差的能力仍然存在挑战。为解决这些问题,已经提出了几种变体,包括使用特征网络或卷积的模型,并在各种任务上表现良好。然而,它们仍然很难处理长序列。状态空间模型(SSMs)在这一背景下出现了有前景的替代方案,尤其是S4和其变体,如S4nd、Hippo、Hyena、诊断状态空间(DSS)、Gated State Spaces(GSS)和Linear Recurrent Unit(LRU)、Liquid-S4、Mamba等。在本次调查中,我们根据三种范式对基本SSMs进行了分类,即开关架构、结构架构和循环架构。本调查还强调了SSMs在各个领域的多样化应用,如视觉、视频、音频、语音、语言(特别是长序列建模)、医学(包括基因组学)、化学(如药物设计)和推荐系统,以及时间序列分析,包括表格数据。此外,我们还分析了SSMs在基准数据集,如Long Range Arena(LRA)、WikiText、Glue、Pile、ImageNet、Kinetics-400、sstv2,以及视频数据集,如Breakfast、COIN、LVU等。Mamba-360工作的项目页面可以在该网页上查看。
https://arxiv.org/abs/2404.16112
In current recommendation systems, temporal data shift poses a significant challenge. The presence of data shift prevents the system from simply enhancing the CTR model's adaptability to new data by adding more training data. We observed that although the correlation between features and labels in recommendation systems changes over time, if a fixed search space is established, the relationship between the data and the search space remains invariant. Therefore, we designed a framework that uses retrieval techniques to leverage shifting data for training a relevance network. However, due to the use of BM25 as a retrieval method, this framework is challenging to deploy in online recommendation systems. We then designed a distillation method using knowledge distillation to transfer knowledge from the relevance network to a parameterized module, the search-distill module. We refer to this entire process as the Retrieval and Distill paradigm (RAD). With the RAD paradigm, we have an effective method for leveraging shifting data to enhance the performance of CTR models. In future research directions, we aim to incorporate a wider variety of data into the CTR model using RAD. On the other hand, enhancing the performance of the distillation method is also a significant area of focus.
在当前的推荐系统中,时间数据变化是一个重大的挑战。数据的变化阻止了系统通过添加更多训练数据来简单地提高CTR模型的适应性。我们观察到,尽管推荐系统中的特征与标签之间的相关性会随着时间的变化而变化,但只要建立了一个固定的搜索空间,数据与搜索空间之间的关系就会保持不变。因此,我们设计了一个使用检索技术利用 shifting 数据来训练相关网络的框架。然而,由于使用 BM25 作为检索方法,这个框架在在线推荐系统中很难部署。然后,我们使用知识蒸馏技术设计了一个馏方法,将来自相关网络的知识传递给参数化模块,即搜索-蒸馏模块。我们将这个整个过程称为检索和蒸馏范式(RAD)。通过 RAD 范式,我们有一种有效的利用移动物品增强CTR模型性能的方法。在未来的研究方向中,我们旨在通过 RAD 将更广泛的数据集成到CTR模型中。另一方面,提高蒸馏方法的效果也是一个重要的关注点。
https://arxiv.org/abs/2404.15678
Today mobile users learn and share their traffic observations via crowdsourcing platforms (e.g., Waze). Yet such platforms simply cater to selfish users' myopic interests to recommend the shortest path, and do not encourage enough users to travel and learn other paths for future others. Prior studies focus on one-shot congestion games without considering users' information learning, while our work studies how users learn and alter traffic conditions on stochastic paths in a human-in-the-loop manner. Our analysis shows that the myopic routing policy leads to severe under-exploration of stochastic paths. This results in a price of anarchy (PoA) greater than $2$, as compared to the socially optimal policy in minimizing the long-term social cost. Besides, the myopic policy fails to ensure the correct learning convergence about users' traffic hazard beliefs. To address this, we focus on informational (non-monetary) mechanisms as they are easier to implement than pricing. We first show that existing information-hiding mechanisms and deterministic path-recommendation mechanisms in Bayesian persuasion literature do not work with even (\text{PoA}=\infty). Accordingly, we propose a new combined hiding and probabilistic recommendation (CHAR) mechanism to hide all information from a selected user group and provide state-dependent probabilistic recommendations to the other user group. Our CHAR successfully ensures PoA less than (\frac{5}{4}), which cannot be further reduced by any other informational (non-monetary) mechanism. Besides the parallel network, we further extend our analysis and CHAR to more general linear path graphs with multiple intermediate nodes, and we prove that the PoA results remain unchanged. Additionally, we carry out experiments with real-world datasets to further extend our routing graphs and verify the close-to-optimal performance of our CHAR.
今天的移动用户通过众包平台(例如,Waze)学习和分享他们的交通观察。然而,这样的平台仅迎合了自私用户的狭隘兴趣,以推荐最短的路径,并没有鼓励足够的用户旅行和探索其他路径,为未来的其他人提供学习机会。以前的研究集中在一次性拥堵游戏中,没有考虑用户的信息学习,而我们的研究则研究了用户如何在随机路径上学习和改变交通状况。我们的分析表明,聚类路由策略导致随机路径的严重缺乏探索。这使得社会最优政策(以最小化长期社会成本为目标)在PoA上大于2。此外,聚类策略无法确保用户关于交通危险信念的正确学习收敛。为了解决这个问题,我们关注信息(非货币)机制,因为它们比定价更容易实现。我们首先证明,在贝叶斯说服性文献中的现有信息隐藏机制和确定性路径推荐机制在PoA=无限的情况下无法起作用。因此,我们提出了一个新的综合隐藏和概率推荐(CHAR)机制,用于隐藏所选用户组的所有信息,并为另一组用户提供基于状态的概率推荐。我们的CHAR成功地确保了PoA小于(五分之四),这可以通过其他信息(非货币)机制进一步降低。除了并行网络之外,我们进一步将分析扩展到具有多个中转节点的更一般线性路径图上,并证明了PoA的结果保持不变。此外,我们还在真实世界数据集上进行实验,进一步扩展我们的路由图,并验证我们的CHAR的近最优性能。
https://arxiv.org/abs/2404.15599
To enhance language models' cultural awareness, we design a generalizable pipeline to construct cultural knowledge bases from different online communities on a massive scale. With the pipeline, we construct CultureBank, a knowledge base built upon users' self-narratives with 12K cultural descriptors sourced from TikTok and 11K from Reddit. Unlike previous cultural knowledge resources, CultureBank contains diverse views on cultural descriptors to allow flexible interpretation of cultural knowledge, and contextualized cultural scenarios to help grounded evaluation. With CultureBank, we evaluate different LLMs' cultural awareness, and identify areas for improvement. We also fine-tune a language model on CultureBank: experiments show that it achieves better performances on two downstream cultural tasks in a zero-shot setting. Finally, we offer recommendations based on our findings for future culturally aware language technologies. The project page is this https URL . The code and model is at this https URL . The released CultureBank dataset is at this https URL .
为了增强语言模型的文化意识,我们设计了一个可扩展的管道,用于在大量互联网社区上构建文化知识库。通过这个管道,我们构建了CultureBank,一个基于用户自我叙述的文化知识库,来源于抖音的12K个文化描述符和来自Reddit的11K个描述符。与之前的文化知识资源不同,CultureBank包含了关于文化描述符的多样性观点,允许对文化知识的灵活解释,并帮助进行 grounded evaluation。使用CultureBank,我们评估了不同自然语言处理模型的文化意识,并确定了需要改进的领域。我们还根据我们的研究结果对未来的文化意识语言技术进行了微调:实验表明,在零散设置中,它在一 downstream的文化任务中实现了更好的表现。最后,我们根据我们的研究结果为未来的文化意识语言技术提供了建议。项目页面在这里:<https://url.org>。代码和模型在这里:<https://url.org>。发布的CultureBank数据集在这里:<https://url.org>。
https://arxiv.org/abs/2404.15238
Multimodal medical imaging plays a pivotal role in clinical diagnosis and research, as it combines information from various imaging modalities to provide a more comprehensive understanding of the underlying pathology. Recently, deep learning-based multimodal fusion techniques have emerged as powerful tools for improving medical image classification. This review offers a thorough analysis of the developments in deep learning-based multimodal fusion for medical classification tasks. We explore the complementary relationships among prevalent clinical modalities and outline three main fusion schemes for multimodal classification networks: input fusion, intermediate fusion (encompassing single-level fusion, hierarchical fusion, and attention-based fusion), and output fusion. By evaluating the performance of these fusion techniques, we provide insight into the suitability of different network architectures for various multimodal fusion scenarios and application domains. Furthermore, we delve into challenges related to network architecture selection, handling incomplete multimodal data management, and the potential limitations of multimodal fusion. Finally, we spotlight the promising future of Transformer-based multimodal fusion techniques and give recommendations for future research in this rapidly evolving field.
多模态医疗影像在临床诊断和研究中扮演着至关重要的角色,因为它结合了各种影像模态的信息,提供更全面的病理解剖学理解。近年来,基于深度学习的多模态融合技术已成为提高医学图像分类的强大工具。本文对基于深度学习的多模态融合在医学分类任务的发展进行了全面的分析。我们探讨了主要临床模态之间的互补关系,并提出了三种主要的融合方案:输入融合、中间融合(包括单层融合、层次融合和基于注意力的融合)和输出融合。通过评估这些融合技术的性能,我们提供了对各种多模态融合场景和应用领域的适用网络架构的洞察。此外,我们还深入探讨了与网络架构选择、处理不完整的多模态数据管理以及多模态融合的潜在限制相关的问题。最后,我们重点关注了基于Transformer的多模态融合技术的光明未来,并给未来在这个快速发展的领域的研究提出了建议。
https://arxiv.org/abs/2404.15022
Graphs play an important role in representing complex relationships in various domains like social networks, knowledge graphs, and molecular discovery. With the advent of deep learning, Graph Neural Networks (GNNs) have emerged as a cornerstone in Graph Machine Learning (Graph ML), facilitating the representation and processing of graph structures. Recently, LLMs have demonstrated unprecedented capabilities in language tasks and are widely adopted in a variety of applications such as computer vision and recommender systems. This remarkable success has also attracted interest in applying LLMs to the graph domain. Increasing efforts have been made to explore the potential of LLMs in advancing Graph ML's generalization, transferability, and few-shot learning ability. Meanwhile, graphs, especially knowledge graphs, are rich in reliable factual knowledge, which can be utilized to enhance the reasoning capabilities of LLMs and potentially alleviate their limitations such as hallucinations and the lack of explainability. Given the rapid progress of this research direction, a systematic review summarizing the latest advancements for Graph ML in the era of LLMs is necessary to provide an in-depth understanding to researchers and practitioners. Therefore, in this survey, we first review the recent developments in Graph ML. We then explore how LLMs can be utilized to enhance the quality of graph features, alleviate the reliance on labeled data, and address challenges such as graph heterogeneity and out-of-distribution (OOD) generalization. Afterward, we delve into how graphs can enhance LLMs, highlighting their abilities to enhance LLM pre-training and inference. Furthermore, we investigate various applications and discuss the potential future directions in this promising field.
图在表示复杂关系方面在社交网络、知识图谱和分子发现等领域中发挥着重要作用。随着深度学习的出现,图神经网络(GNNs)成为图机器学习(Graph ML)的一个支柱,推动了图结构的表示和处理。近年来,LLM在语言任务上的表现已经达到了史无前例的水平,并在各种应用领域(如计算机视觉和推荐系统)得到了广泛应用。这一显著的成功也引起了将LLM应用于图形领域的兴趣。越来越多的努力致力于探索LLM在推动图机器学习的一般化、可迁移性和少样本学习能力方面的潜力。同时,特别是知识图谱,图形在可靠的事实知识方面非常丰富,可以利用来增强LLM的推理能力,并可能减轻其局限性,如幻觉和缺乏可解释性。鉴于这一研究领域的快速进步,对于LLM时代图机器学习的系统综述总结最新的进展是必要的,以提供研究人员和实践者对这一领域的深入理解。因此,在本次调查中,我们首先回顾了图机器学习领域的最新发展。然后,我们探讨了LLM如何用于提高图形特征的质量、减轻对标注数据的依赖以及解决诸如图形异质性和离散(OOD)泛化等问题。接着,我们深入研究了图形如何增强LLM,强调了它们在提高LLM预训练和推理能力方面的能力。最后,我们调查了各种应用,并讨论了这一充满前景的领域未来的潜在方向。
https://arxiv.org/abs/2404.14928
Biometric Verification (BV) systems often exhibit accuracy disparities across different demographic groups, leading to biases in BV applications. Assessing and quantifying these biases is essential for ensuring the fairness of BV systems. However, existing bias evaluation metrics in BV have limitations, such as focusing exclusively on match or non-match error rates, overlooking bias on demographic groups with performance levels falling between the best and worst performance levels, and neglecting the magnitude of the bias present. This paper presents an in-depth analysis of the limitations of current bias evaluation metrics in BV and, through experimental analysis, demonstrates their contextual suitability, merits, and limitations. Additionally, it introduces a novel general-purpose bias evaluation measure for BV, the ``Sum of Group Error Differences (SEDG)''. Our experimental results on controlled synthetic datasets demonstrate the effectiveness of demographic bias quantification when using existing metrics and our own proposed measure. We discuss the applicability of the bias evaluation metrics in a set of simulated demographic bias scenarios and provide scenario-based metric recommendations. Our code is publicly available under \url{this https URL}.
生物特征验证(BV)系统通常在不同的 demographic群体之间表现出不一致的准确性,导致 BV 应用中的偏见。评估和量化这些偏见对于确保 BV 系统的公正性至关重要。然而,现有的 BV 偏见评估指标存在局限性,例如仅关注匹配或非匹配错误率,忽视了绩效水平在最好和最差水平之间的偏差,并忽略了存在的偏见的规模。本文对当前 BV 偏见评估指标的局限性进行了深入分析,并通过实验验证了它们的使用价值和局限性。此外,我们引入了一种新的通用的 BV 偏见评估指标——“组内误差差异(SEDG)”。我们对控制性合成数据集的实验结果表明,使用现有指标可以有效地量化 demographic 偏见。我们还讨论了在模拟 demographic 偏见场景中应用偏见评估指标的适用性,并提供了基于场景的指标建议。我们的代码可在 \url{这个链接} 上公开使用。
https://arxiv.org/abs/2404.15385
Information Retrieval (IR) systems are crucial tools for users to access information, widely applied in scenarios like search engines, question answering, and recommendation systems. Traditional IR methods, based on similarity matching to return ranked lists of documents, have been reliable means of information acquisition, dominating the IR field for years. With the advancement of pre-trained language models, generative information retrieval (GenIR) has emerged as a novel paradigm, gaining increasing attention in recent years. Currently, research in GenIR can be categorized into two aspects: generative document retrieval (GR) and reliable response generation. GR leverages the generative model's parameters for memorizing documents, enabling retrieval by directly generating relevant document identifiers without explicit indexing. Reliable response generation, on the other hand, employs language models to directly generate the information users seek, breaking the limitations of traditional IR in terms of document granularity and relevance matching, offering more flexibility, efficiency, and creativity, thus better meeting practical needs. This paper aims to systematically review the latest research progress in GenIR. We will summarize the advancements in GR regarding model training, document identifier, incremental learning, downstream tasks adaptation, multi-modal GR and generative recommendation, as well as progress in reliable response generation in aspects of internal knowledge memorization, external knowledge augmentation, generating response with citations and personal information assistant. We also review the evaluation, challenges and future prospects in GenIR systems. This review aims to offer a comprehensive reference for researchers in the GenIR field, encouraging further development in this area.
信息检索(IR)系统对于用户访问信息至关重要,在搜索引擎、问题回答和推荐系统等场景中得到了广泛应用。传统的IR方法,基于相似度匹配返回排名文档的排名列表,是不可靠的信息获取手段,多年来一直是IR领域的主导。随着预训练语言模型的进步,生成式信息检索(GenIR)作为一种新颖的范式应运而生,并在近年来受到了越来越多的关注。目前,GenIR的研究可以分为两个方面:生成式文档检索(GR)和可靠响应生成。GR利用生成模型的参数进行记忆文档,直接生成相关文档标识,无需显式索引。另一方面,可靠响应生成采用语言模型直接生成用户所需的信息,打破了传统IR在文档粒度和相关性匹配方面的限制,提供了更多的灵活性、效率和创新,从而更好地满足实际需求。本文旨在系统地回顾GenIR的最新研究进展。我们将总结GR关于模型训练、文档标识、增量学习、下游任务适应、多模态GR和生成式推荐以及可靠响应生成的最新进展,以及GR在内部知识记忆、外部知识增强和生成带有引用和个人信息助手回应方面的进展。我们还回顾了GenIR系统的评估、挑战和未来前景。本 review旨在为GenIR领域的研究人员提供全面的参考,鼓励该领域进一步发展。
https://arxiv.org/abs/2404.14851
A series of graph filtering (GF)-based collaborative filtering (CF) showcases state-of-the-art performance on the recommendation accuracy by using a low-pass filter (LPF) without a training process. However, conventional GF-based CF approaches mostly perform matrix decomposition on the item-item similarity graph to realize the ideal LPF, which results in a non-trivial computational cost and thus makes them less practical in scenarios where rapid recommendations are essential. In this paper, we propose Turbo-CF, a GF-based CF method that is both training-free and matrix decomposition-free. Turbo-CF employs a polynomial graph filter to circumvent the issue of expensive matrix decompositions, enabling us to make full use of modern computer hardware components (i.e., GPU). Specifically, Turbo-CF first constructs an item-item similarity graph whose edge weights are effectively regulated. Then, our own polynomial LPFs are designed to retain only low-frequency signals without explicit matrix decompositions. We demonstrate that Turbo-CF is extremely fast yet accurate, achieving a runtime of less than 1 second on real-world benchmark datasets while achieving recommendation accuracies comparable to best competitors.
一系列基于图滤波(GF)的协同过滤(CF)方法展示了一种无需训练过程在推荐准确性上实现最先进性能的方法。然而,传统的GF-based CF方法通常在物品物品相似图上进行矩阵分解,以实现理想的LPF,这导致计算成本非零,因此在需要快速推荐的场景中,它们变得更不实用。在本文中,我们提出了Turbo-CF,一种基于GF的CF方法,既无需训练过程又无需矩阵分解。Turbo-CF采用多项式图滤波器来绕过昂贵的矩阵分解问题,使我们能够充分利用现代计算机硬件组件(即GPU)。具体来说,Turbo-CF首先构建了一个物品物品相似图,其中边权重有效地调节。然后,我们自己的多项式低频信号保留函数被设计为仅保留低频信号,而没有显式矩阵分解。我们证明了Turbo-CF既快速又准确,在现实世界的基准数据集上的运行时间不到1秒,同时具有与最佳竞争对手相当的推荐准确性。
https://arxiv.org/abs/2404.14243
A recent study has shown that diffusion models are well-suited for modeling the generative process of user-item interactions in recommender systems due to their denoising nature. However, existing diffusion model-based recommender systems do not explicitly leverage high-order connectivities that contain crucial collaborative signals for accurate recommendations. Addressing this gap, we propose CF-Diff, a new diffusion model-based collaborative filtering (CF) method, which is capable of making full use of collaborative signals along with multi-hop neighbors. Specifically, the forward-diffusion process adds random noise to user-item interactions, while the reverse-denoising process accommodates our own learning model, named cross-attention-guided multi-hop autoencoder (CAM-AE), to gradually recover the original user-item interactions. CAM-AE consists of two core modules: 1) the attention-aided AE module, responsible for precisely learning latent representations of user-item interactions while preserving the model's complexity at manageable levels, and 2) the multi-hop cross-attention module, which judiciously harnesses high-order connectivity information to capture enhanced collaborative signals. Through comprehensive experiments on three real-world datasets, we demonstrate that CF-Diff is (a) Superior: outperforming benchmark recommendation methods, achieving remarkable gains up to 7.29% compared to the best competitor, (b) Theoretically-validated: reducing computations while ensuring that the embeddings generated by our model closely approximate those from the original cross-attention, and (c) Scalable: proving the computational efficiency that scales linearly with the number of users or items.
最近的一项研究表明,扩散模型在推荐系统用户物品交互的生成过程中具有很好的适用性,因为它们的去噪性质。然而,现有的基于扩散模型的推荐系统并没有明确利用高阶连接性,这些高阶连接性包含了对准确推荐至关重要的合作信号。为解决这个空白,我们提出了CF-Diff,一种基于扩散模型的合作过滤(CF)方法,它能够充分利用合作信号和多级邻居。具体来说,前扩散过程对用户物品交互添加随机噪声,而反扩散过程则适应我们自己的学习模型,名为跨注意力和多级循环自动编码器(CAM-AE),逐渐恢复原始用户物品交互。CAM-AE由两个核心模块组成:1)负责精确学习用户物品交互的潜在表示,同时保持模型的复杂度在可管理水平,并保留模型的复杂性的自注意力辅助AE模块;2)是一个多级跨注意力模块,它谨慎地利用高阶连接性信息来捕捉增强的合作信号。通过对三个真实世界数据集的全面实验,我们证明了CF-Diff具有以下优势:(a)优越:超越了基准推荐方法,实现了最高达7.29%的显著提高,与最佳竞争者相比;(b)理论上有验证:在确保我们的模型生成的嵌入与原始跨注意力的嵌入接近的情况下减少计算;(c)可扩展性:证明了计算效率与用户数量或物品数量成线性关系。
https://arxiv.org/abs/2404.14240
With the recent advances in machine learning, creating agents that behave realistically in simulated air combat has become a growing field of interest. This survey explores the application of machine learning techniques for modeling air combat behavior, motivated by the potential to enhance simulation-based pilot training. Current simulated entities tend to lack realistic behavior, and traditional behavior modeling is labor-intensive and prone to loss of essential domain knowledge between development steps. Advancements in reinforcement learning and imitation learning algorithms have demonstrated that agents may learn complex behavior from data, which could be faster and more scalable than manual methods. Yet, making adaptive agents capable of performing tactical maneuvers and operating weapons and sensors still poses a significant challenge. The survey examines applications, behavior model types, prevalent machine learning methods, and the technical and human challenges in developing adaptive and realistically behaving agents. Another challenge is the transfer of agents from learning environments to military simulation systems and the consequent demand for standardization. Four primary recommendations are presented regarding increased emphasis on beyond-visual-range scenarios, multi-agent machine learning and cooperation, utilization of hierarchical behavior models, and initiatives for standardization and research collaboration. These recommendations aim to address current issues and guide the development of more comprehensive, adaptable, and realistic machine learning-based behavior models for air combat applications.
随着机器学习 recent 进步,创建在模拟空中战斗中表现真实的代理已成为一个增长兴趣的领域。本调查探讨了使用机器学习技术对建模空中战斗行为的应用,这是为了增强基于模拟的飞行员培训的可能性。当前的模拟实体往往缺乏现实行为,而传统的行为建模需要大量的人力劳动,并且在开发过程中容易丢失关键领域知识。机器学习算法中的强化学习和模仿学习的进步已经证明了,代理可以从数据中学习复杂的行为,这可能会比手动方法更快、更具有可扩展性。然而,使自适应代理能够执行战术机动和操作武器和传感器仍然是一个重大挑战。调查检查了应用、行为模型类型、普遍的机器学习方法和开发过程中的人力和技术挑战。另一个挑战是将自己从学习环境中转移到军事仿真系统,以及随之而来的标准化需求。关于增加对超视距场景的重视、多代理机学习与合作、使用层次行为模型和标准化及研究合作倡议,提出了四个主要建议。这些建议旨在解决当前问题,并指导开发更全面、可扩展和真实感的机器学习为基础的行为模型,为军事应用做好准备。
https://arxiv.org/abs/2404.13954
Large Language Models (LLMs) have emerged as potent tools for advancing the United Nations' Sustainable Development Goals (SDGs). However, the attitudinal disparities between LLMs and humans towards these goals can pose significant challenges. This study conducts a comprehensive review and analysis of the existing literature on the attitudes of LLMs towards the 17 SDGs, emphasizing the comparison between their attitudes and support for each goal and those of humans. We examine the potential disparities, primarily focusing on aspects such as understanding and emotions, cultural and regional differences, task objective variations, and factors considered in the decision-making process. These disparities arise from the underrepresentation and imbalance in LLM training data, historical biases, quality issues, lack of contextual understanding, and skewed ethical values reflected. The study also investigates the risks and harms that may arise from neglecting the attitudes of LLMs towards the SDGs, including the exacerbation of social inequalities, racial discrimination, environmental destruction, and resource wastage. To address these challenges, we propose strategies and recommendations to guide and regulate the application of LLMs, ensuring their alignment with the principles and goals of the SDGs, and therefore creating a more just, inclusive, and sustainable future.
大语言模型(LLMs)已成为促进联合国可持续发展目标(SDGs)的有力工具。然而,LLMs和人类之间针对这些目标的態度差异可能会带来重大挑战。这项研究对现有文献进行了全面回顾和分析,重点关注LLMs对17个SDGs的態度,强调它们的态度和支持与人类的相比。我们检查了可能存在的差异,主要关注理解与情感、文化地区差异、任务目标变化和决策过程因素等方面。这些差异源于LLM训练数据的不足和失衡,历史偏见,质量问题,缺乏语境理解,以及反映伦理价值观的失衡。研究还探讨了忽视LLMs对SDGs的態度可能产生的风险和危害,包括加剧社会不平等、种族歧视、环境破坏和资源浪费。为了应对这些挑战,我们提出了指导和管理LLM使用的策略和建议,确保其与SDGs的原则和目标保持一致,从而为创造一个更加公正、包容和可持续的未来做出贡献。
https://arxiv.org/abs/2404.13885
Watching movies is one of the social activities typically done in groups. Emotion is the most vital factor that affects movie viewers' preferences. So, the emotional aspect of the movie needs to be determined and analyzed for further recommendations. It can be challenging to choose a movie that appeals to the emotions of a diverse group. Reaching an agreement for a group can be difficult due to the various genres and choices. This paper proposes a novel approach to group movie suggestions by examining emotions from three different channels: movie descriptions (text), soundtracks (audio), and posters (image). We employ the Jaccard similarity index to match each participant's emotional preferences to prospective movie choices, followed by a fuzzy inference technique to determine group consensus. We use a weighted integration process for the fusion of emotion scores from diverse data types. Then, group movie recommendation is based on prevailing emotions and viewers' best-loved movies. After determining the recommendations, the group's consensus level is calculated using a fuzzy inference system, taking participants' feedback as input. Participants (n=130) in the survey were provided with different emotion categories and asked to select the emotions best suited for particular movies (n=12). Comparison results between predicted and actual scores demonstrate the efficiency of using emotion detection for this problem (Jaccard similarity index = 0.76). We explored the relationship between induced emotions and movie popularity as an additional experiment, analyzing emotion distribution in 100 popular movies from the TMDB database. Such systems can potentially improve the accuracy of movie recommendation systems and achieve a high level of consensus among participants with diverse preferences.
观看电影是人们在集体活动中通常会进行的一种活动。情感是影响电影观众偏好的最至关重要的因素。因此,电影的情感方面需要进行确定和分析,为进一步建议提供依据。选择一部能引起观众情感共鸣的电影可能会具有挑战性。由于各种流派和选择,达成 group 一致意见可能很难。本文提出了一种通过研究电影描述(文本)、音乐(音频)和海报(图像)中的情感来提出新的群体电影建议的方法。我们使用 Jaccard 相似性指数将每个参与者的情感偏好与潜在电影选择匹配,然后使用模糊推理技术确定群体共识。我们使用加权集成过程对不同数据类型的情感分数进行融合。然后,群体电影推荐是基于当前情感和观众最喜欢的电影。在确定推荐后,使用模糊推理系统计算群体共识水平,以输入参与者的反馈。调查中的参与者(n=130)被提供了不同的情感类别,并被要求选择最适合特定电影的情感(n=12)。预测和实际得分的比较结果证明了使用情感检测解决这个问题(Jaccard 相似性指数 = 0.76)的有效性。我们还研究了诱导情感与电影流行程度之间的关系,作为另一个实验,分析了来自 TMDB 数据库中100部热门电影的情感分布。这样的系统可以有潜力提高电影推荐系统的准确性,并在具有不同偏好的参与者之间实现高水平的共识。
https://arxiv.org/abs/2404.13778
Counterfactual Explanations (CEs) have emerged as a major paradigm in explainable AI research, providing recourse recommendations for users affected by the decisions of machine learning models. However, when slight changes occur in the parameters of the underlying model, CEs found by existing methods often become invalid for the updated models. The literature lacks a way to certify deterministic robustness guarantees for CEs under model changes, in that existing methods to improve CEs' robustness are heuristic, and the robustness performances are evaluated empirically using only a limited number of retrained models. To bridge this gap, we propose a novel interval abstraction technique for parametric machine learning models, which allows us to obtain provable robustness guarantees of CEs under the possibly infinite set of plausible model changes $\Delta$. We formalise our robustness notion as the $\Delta$-robustness for CEs, in both binary and multi-class classification settings. We formulate procedures to verify $\Delta$-robustness based on Mixed Integer Linear Programming, using which we further propose two algorithms to generate CEs that are $\Delta$-robust. In an extensive empirical study, we demonstrate how our approach can be used in practice by discussing two strategies for determining the appropriate hyperparameter in our method, and we quantitatively benchmark the CEs generated by eleven methods, highlighting the effectiveness of our algorithms in finding robust CEs.
事实解释(CE)作为一种解释性人工智能研究的主要范式,为受到机器学习模型决策影响的用户提供了恢复建议。然而,当模型的参数发生微小变化时,由现有方法找到的CEs往往对于更新后的模型变得无效。由于现有方法提高CE的 robustness 都是启发式的,并且只使用有限的重新训练模型来评估CE的 robustness,因此我们本文提出了一种新颖的参数抽象技术,用于参数机器学习模型,允许我们在可能无限个可信模型变化$\Delta$下获得CE的证明鲁棒性保证。我们将我们的鲁棒性概念定义为CE的$\Delta$鲁棒性。在二分类和多分类分类设置中,我们将鲁棒性定义为CE的$\Delta$鲁棒性。我们根据混合整数线性规划制定验证$\Delta$鲁棒性的过程,并进一步提出两种生成CEs使其具有$\Delta$鲁棒的算法。在广泛的实证研究中,我们讨论了两种确定适当超参数的方法,并通过比较十一种方法生成的CE的 effectiveness定量分析了我们的算法在找到稳健CE方面的效果。
https://arxiv.org/abs/2404.13736
Addressing the global challenge of breast cancer, this research explores the fusion of generative AI, focusing on ChatGPT 3.5 turbo model, and the intricacies of breast cancer risk assessment. The research aims to evaluate ChatGPT's reasoning capabilities, emphasizing its potential to process rules and provide explanations for screening recommendations. The study seeks to bridge the technology gap between intelligent machines and clinicians by demonstrating ChatGPT's unique proficiency in natural language reasoning. The methodology employs a supervised prompt-engineering approach to enforce detailed explanations for ChatGPT's recommendations. Synthetic use cases, generated algorithmically, serve as the testing ground for the encoded rules, evaluating the model's processing prowess. Findings highlight ChatGPT's promising capacity in processing rules comparable to Expert System Shells, with a focus on natural language reasoning. The research introduces the concept of reinforcement explainability, showcasing its potential in elucidating outcomes and facilitating user-friendly interfaces for breast cancer risk assessment.
乳腺癌是全球性的挑战,这项研究探讨了生成人工智能(AI)的融合,重点关注 ChatGPT 3.5 涡轮模型,以及乳腺癌风险评估的复杂性。研究旨在评估 ChatGPT的推理能力,强调其处理规则和提供筛查建议的解释潜力。该研究旨在通过证明 ChatGPT 在自然语言推理方面的独特优势,弥合智能机器和临床医生之间的技术差距。研究方法采用监督提示工程方法来详细解释 ChatGPT 的建议。通过生成程序设计的合成用例来评估模型的处理能力。研究结果突出了 ChatGPT 在处理与专家系统 Shell 相当规则方面的 promising 能力,重点关注自然语言推理。研究引入了强化可解释性的概念,展示了其在阐明结果和促进易于乳腺癌风险评估的用户界面方面的潜在能力。
https://arxiv.org/abs/2404.14454