In recommendation systems, a large portion of the ratings are missing due to the selection biases, which is known as Missing Not At Random. The counterfactual inverse propensity scoring (IPS) was used to weight the imputation error of every observed rating. Although effective in multiple scenarios, we argue that the performance of IPS estimation is limited due to the uncertainty miscalibration of propensity estimation. In this paper, we propose the uncertainty calibration for the propensity estimation in recommendation systems with multiple representative uncertainty calibration techniques. Theoretical analysis on the bias and generalization bound shows the superiority of the calibrated IPS estimator over the uncalibrated one. Experimental results on the coat and yahoo datasets shows that the uncertainty calibration is improved and hence brings the better recommendation results.
在推荐系统中,大量评分因为选择偏差而缺失,这种现象被称为随机缺失。实际反概率评价(IPS)被用来计算每个观测评分的估计误差,尽管在多个情况下有效,但我们认为由于估计概率的不确定性校准不足,IPS估计的性能受到限制。在本文中,我们提议使用多个代表性不确定性校准技术来校准推荐系统中的估计概率校准。对偏差和泛化限制的理论分析表明,校准后的IPS估计器比未校准的更加优越。在 coat 和 yahoo 数据集上的实验结果表明,不确定性校准得到了改进,因此带来了更好的推荐结果。
https://arxiv.org/abs/2303.12973
We propose a content-based system for matching video and background music. The system aims to address the challenges in music recommendation for new users or new music give short-form videos. To this end, we propose a cross-modal framework VMCML that finds a shared embedding space between video and music representations. To ensure the embedding space can be effectively shared by both representations, we leverage CosFace loss based on margin-based cosine similarity loss. Furthermore, we establish a large-scale dataset called MSVD, in which we provide 390 individual music and the corresponding matched 150,000 videos. We conduct extensive experiments on Youtube-8M and our MSVD datasets. Our quantitative and qualitative results demonstrate the effectiveness of our proposed framework and achieve state-of-the-art video and music matching performance.
我们提出了一种基于内容匹配视频和背景音乐的系统。该系统旨在解决对新用户或新音乐提供简短视频的音乐推荐面临的挑战。为此,我们提出了一种跨媒体框架VMCML,该框架在视频和音乐表示之间找到了共享嵌入空间。为了确保嵌入空间能够 effectively shared by both representations,我们利用基于 margin-based cosine similarity loss的CosFace损失。此外,我们建立了一个大型数据集MSVD,其中提供了390个单独的音乐视频,并匹配了150,000个相关视频。我们在Youtube-8M和MSVD数据集上进行了广泛的实验。我们的定量和定性结果证明了我们提出的框架的有效性,并实现了最先进的视频和音乐匹配表现。
https://arxiv.org/abs/2303.12379
The possibility of high-precision and rapid detection of pathologies on chest X-rays makes it possible to detect the development of pneumonia at an early stage and begin immediate treatment. Artificial intelligence can speed up and qualitatively improve the procedure of X-ray analysis and give recommendations to the doctor for additional consideration of suspicious images. The purpose of this study is to determine the best models and implementations of the transfer learning method in the binary classification problem in the presence of a small amount of training data. In this article, various methods of augmentation of the initial data and approaches to training ResNet and DenseNet models for black-and-white X-ray images are considered, those approaches that contribute to obtaining the highest results of the accuracy of determining cases of pneumonia and norm at the testing stage are identified.
在高分辨率和快速检测 chest X-ray 中的常见病理的情况下,有可能早期检测出肺炎并立即开始治疗。人工智能可以加速和优化 X-ray 分析的过程,并建议医生对可疑图像进行额外的考虑。本文旨在确定在少量训练数据的情况下,在二进制分类问题中最佳的模型和实现,以及用于训练 ResNet 和 DenseNet 模型的增强方法。本文考虑了各种增加初始数据的方法和训练黑白 X-ray 图像的 ResNet 和 DenseNet 模型的方法。这些方法有助于在测试阶段获得最高的确定肺炎病例和标准的准确率。
https://arxiv.org/abs/2303.10601
This report addresses the technical aspects of de-identification of medical images of human subjects and biospecimens, such that re-identification risk of ethical, moral, and legal concern is sufficiently reduced to allow unrestricted public sharing for any purpose, regardless of the jurisdiction of the source and distribution sites. All medical images, regardless of the mode of acquisition, are considered, though the primary emphasis is on those with accompanying data elements, especially those encoded in formats in which the data elements are embedded, particularly Digital Imaging and Communications in Medicine (DICOM). These images include image-like objects such as Segmentations, Parametric Maps, and Radiotherapy (RT) Dose objects. The scope also includes related non-image objects, such as RT Structure Sets, Plans and Dose Volume Histograms, Structured Reports, and Presentation States. Only de-identification of publicly released data is considered, and alternative approaches to privacy preservation, such as federated learning for artificial intelligence (AI) model development, are out of scope, as are issues of privacy leakage from AI model sharing. Only technical issues of public sharing are addressed.
本报告探讨了人类 subjects 和生物样本的医疗图像去识别的技术方面,以便能够尽量减少伦理、道德和法律方面的重识别风险,以便能够无限制地公开分享,无论来源和分发地点的管辖范围如何。无论 acquisition 方式如何,所有医疗图像都被考虑,但主要强调的是与数据元素相伴的医疗图像,特别是那些在数据元素嵌入的格式中编码的数据元素,特别是医学影像和通信(DICOM)格式。这些图像包括类似于图像的对象,如分割、标准化地图和放疗剂量对象。范围还包括相关的非图像对象,如放疗结构集、计划和剂量体积统计表、结构化报告和呈现状态。仅考虑公开发布的数据去识别,而保护隐私的替代方法,如人工智能模型开发的联邦学习,则超出了范围,同时也包括从人工智能模型共享中泄露隐私的问题。仅处理公开分享的技术问题。
https://arxiv.org/abs/2303.10473
Educational technology innovations that have been developed based on large language models (LLMs) have shown the potential to automate the laborious process of generating and analysing textual content. While various innovations have been developed to automate a range of educational tasks (e.g., question generation, feedback provision, and essay grading), there are concerns regarding the practicality and ethicality of these innovations. Such concerns may hinder future research and the adoption of LLMs-based innovations in authentic educational contexts. To address this, we conducted a systematic literature review of 118 peer-reviewed papers published since 2017 to pinpoint the current state of research on using LLMs to automate and support educational tasks. The practical and ethical challenges of LLMs-based innovations were also identified by assessing their technological readiness, model performance, replicability, system transparency, privacy, equality, and beneficence. The findings were summarised into three recommendations for future studies, including updating existing innovations with state-of-the-art models (e.g., GPT-3), embracing the initiative of open-sourcing models/systems, and adopting a human-centred approach throughout the developmental process. These recommendations could support future research to develop practical and ethical innovations for supporting diverse educational tasks and benefiting students, teachers, and institutions.
基于大型语言模型(LLMs)的开发出来的教育技术创新,展现出了自动化生成和分析文本内容的潜力。虽然各种创新旨在自动化多种教育任务(例如问题生成、反馈提供和作文评估),但对这些创新的实用性和道德性存在担忧。这些担忧可能会阻碍未来研究和在真实教育环境中采用LLMs-based创新。为了解决这个问题,我们进行了一项系统性的文献综述,自2017年以来发表了118篇论文,以确定使用LLMs来自动化和支持教育任务的最新研究状态。通过评估LLMs的创新技术的可行性、模型性能、可重复性、系统透明度、隐私、平等和福利,我们也确定了LLMs-based创新的实际和道德挑战。总结这些发现,我们提出了三个未来的研究建议,包括更新现有的创新使用最先进的模型(例如GPT-3)、拥抱开源模型/系统的倡议,并在整个发展过程中采用人中心的方法。这些建议可支持未来研究开发支持多种教育任务的实际和道德创新,以造福学生、教师和机构。
https://arxiv.org/abs/2303.13379
The impacts of link recommendations on social networks are challenging to evaluate, and so far they have been studied in limited settings. Observational studies are restricted in the kinds of causal questions they can answer and naive A/B tests often lead to biased evaluations due to unaccounted network interference. Furthermore, evaluations in simulation settings are often limited to static network models that do not take into account the potential feedback loops between link recommendation and organic network evolution. To this end, we study the impacts of recommendations on social networks in dynamic settings. Adopting a simulation-based approach, we consider an explicit dynamic formation model -- an extension of the celebrated Jackson-Rogers model -- and investigate how link recommendations affect network evolution over time. Empirically, we find that link recommendations have surprising delayed and indirect effects on the structural properties of networks. Specifically, we find that link recommendations can exhibit considerably different impacts in the immediate term and in the long term. For instance, we observe that friend-of-friend recommendations can have an immediate effect in decreasing degree inequality, but in the long term, they can make the degree distribution substantially more unequal. Moreover, we show that the effects of recommendations can persist in networks, in part due to their indirect impacts on natural dynamics even after recommendations are turned off. We show that, in counterfactual simulations, removing the indirect effects of link recommendations can make the network trend faster toward what it would have been under natural growth dynamics.
链接推荐对社交媒体的影响是难以评估的,迄今为止,它们只能在有限的环境中进行研究。观察研究被限制在可以回答因果关系的问题的种类上,而天真的概率测试往往会因为未 accounting 的网络干扰而产生偏见。此外,在模拟环境下的评估通常只局限于静态网络模型,而这些模型没有考虑到链接推荐和有机网络进化之间的潜在反馈循环。因此,我们研究在动态环境下,链接推荐对社交媒体的影响。采用模拟方法,我们考虑了一个明确的动态形成模型——著名的杰克逊-Rogers 模型的扩展,并研究链接推荐如何随着时间影响网络进化。经验表明,链接推荐对网络的结构性质产生了令人惊讶的延迟和间接影响。具体来说,我们发现,链接推荐可以在短期和长期内表现出显著不同的影响。例如,我们观察到,朋友推荐可以在短期内减少Degree 不平等,但在长期内,它们可能会使Degree 分布更加不平等。此外,我们表明,推荐的影响可以在网络中持续存在,这部分是因为它们的间接影响对自然动态过程的影响,即使在推荐被关闭后也是如此。我们表明,在反事实模拟中,去除链接推荐的间接影响可以使网络趋势更快地朝着在自然增长动态下应该出现的方向前进。
https://arxiv.org/abs/2303.09700
We introduce a Reinforcement Learning Psychotherapy AI Companion that generates topic recommendations for therapists based on patient responses. The system uses Deep Reinforcement Learning (DRL) to generate multi-objective policies for four different psychiatric conditions: anxiety, depression, schizophrenia, and suicidal cases. We present our experimental results on the accuracy of recommended topics using three different scales of working alliance ratings: task, bond, and goal. We show that the system is able to capture the real data (historical topics discussed by the therapists) relatively well, and that the best performing models vary by disorder and rating scale. To gain interpretable insights into the learned policies, we visualize policy trajectories in a 2D principal component analysis space and transition matrices. These visualizations reveal distinct patterns in the policies trained with different reward signals and trained on different clinical diagnoses. Our system's success in generating DIsorder-Specific Multi-Objective Policies (DISMOP) and interpretable policy dynamics demonstrates the potential of DRL in providing personalized and efficient therapeutic recommendations.
我们引入了一个强化学习心理治疗AI助手,该助手基于患者响应生成治疗师推荐的主题。该系统使用深度强化学习(DRL)生成针对四种不同精神健康问题的多目标政策:焦虑、抑郁、精神分裂症和自杀案件。我们展示了使用三种不同的工作联盟评估尺度生成的推荐主题准确性的实验结果:任务、联盟和目标。我们表明,系统能够较好地捕捉真实数据(治疗师讨论的历史主题),而最佳表现模型根据 Disorder 和评估尺度而异。为了获得可解释 insights into 学习的政策,我们在2D主成分分析空间和转换矩阵中可视化了政策轨迹。这些可视化揭示了训练使用不同奖励信号和训练不同临床诊断的政策的不同模式。我们的系统成功生成不同类型的多目标政策(DISMOP)和可解释的政策动态,这表明DRL在提供个性化和高效的治疗建议方面的潜力。
https://arxiv.org/abs/2303.09601
Achieving human-like communication with machines remains a classic, challenging topic in the field of Knowledge Representation and Reasoning and Natural Language Processing. These Large Language Models (LLMs) rely on pattern-matching rather than a true understanding of the semantic meaning of a sentence. As a result, they may generate incorrect responses. To generate an assuredly correct response, one has to "understand" the semantics of a sentence. To achieve this "understanding", logic-based (commonsense) reasoning methods such as Answer Set Programming (ASP) are arguably needed. In this paper, we describe the AutoConcierge system that leverages LLMs and ASP to develop a conversational agent that can truly "understand" human dialogs in restricted domains. AutoConcierge is focused on a specific domain-advising users about restaurants in their local area based on their preferences. AutoConcierge will interactively understand a user's utterances, identify the missing information in them, and request the user via a natural language sentence to provide it. Once AutoConcierge has determined that all the information has been received, it computes a restaurant recommendation based on the user-preferences it has acquired from the human user. AutoConcierge is based on our STAR framework developed earlier, which uses GPT-3 to convert human dialogs into predicates that capture the deep structure of the dialog's sentence. These predicates are then input into the goal-directed s(CASP) ASP system for performing commonsense reasoning. To the best of our knowledge, AutoConcierge is the first automated conversational agent that can realistically converse like a human and provide help to humans based on truly understanding human utterances.
与机器实现类似的方式进行通信仍然是知识表示和推理与自然语言处理领域的经典挑战性话题。这些大型语言模型(LLMs)依赖匹配模式而不是真正理解句子语义的含义。因此,它们可能会生成错误的回答。要生成确保准确的回答,你必须“理解”句子的语义。为了实现“理解”,基于逻辑(常识)推理方法,如答案集编程(ASP)等方法可能是必要的。在本文中,我们描述了自动售货机系统,它利用LLMs和ASP开发了一个能够在特定领域真正“理解”人类对话的交互式代理。自动售货机专注于一个特定领域,根据用户的偏好向用户推荐当地餐馆。自动售货机将 interactively 理解用户的言语,识别其中的缺失信息,并通过自然语言句子请求用户提供它。一旦自动售货机确定了所有信息都已接收,它将根据从人类用户的偏好获取的用户偏好计算一个餐馆推荐。据我们所知,自动售货机是第一个能够在真正理解人类言语的基础上像人类一样对话并为人类提供帮助的自动化对话代理。
https://arxiv.org/abs/2303.08941
Post-training quantization (\ptq) had been recently shown as a compromising method to reduce the memory consumption and/or compute cost for large language models. However, a comprehensive study about the effect of different quantization schemes, different model families, different \ptq methods, different quantization bit precision, etc, is still missing. In this work, we provide an extensive study on those components over tens of thousands of zero-shot experiments. Our results show that (1) Fine-grained quantization and \ptq methods (instead of naive round-to-nearest quantization) are necessary to achieve good accuracy and (2) Higher bits (e.g., 5 bits) with coarse-grained quantization is more powerful than lower bits (e.g., 4 bits) with very fine-grained quantization (whose effective bits is similar to 5-bits). We also present recommendations about how to utilize quantization for \llms with different sizes, and leave suggestions of future opportunities and system work that are not resolved in this work.
最近,Post-training quantization (ptq) 被证明是一种降低大型语言模型内存消耗和/或计算成本的妥协方法。然而,关于不同 quantizationScheme、不同模型家族、不同 ptq 方法、不同 quantization bit precision 等的不同效应的全面研究仍然缺失。在本文中,我们对数千次零样本实验中的这些组件进行了广泛的研究。我们的结果显示(1) 精细的量化和 ptq 方法(而不是简单的整数Round-to-nearest量化)是必要的,以实现良好的精度,(2) 粗粒度的量化的更高的位(例如 5 位)比精细的量化的更低的位(例如 4 位)更有威力(其有效位类似于 5 位)。我们还提出了如何对不同大小的 llms 利用量化的建议,并留下了本工作未解决的未来机会和系统工作的建议。
https://arxiv.org/abs/2303.08302
We present a single neural network architecture composed of task-agnostic components (ViTs, convolutions, and LSTMs) that achieves state-of-art results on both the ImageNav ("go to location in <this picture>") and ObjectNav ("find a chair") tasks without any task-specific modules like object detection, segmentation, mapping, or planning modules. Such general-purpose methods offer advantages of simplicity in design, positive scaling with available compute, and versatile applicability to multiple tasks. Our work builds upon the recent success of self-supervised learning (SSL) for pre-training vision transformers (ViT). However, while the training recipes for convolutional networks are mature and robust, the recipes for ViTs are contingent and brittle, and in the case of ViTs for visual navigation, yet to be fully discovered. Specifically, we find that vanilla ViTs do not outperform ResNets on visual navigation. We propose the use of a compression layer operating over ViT patch representations to preserve spatial information along with policy training improvements. These improvements allow us to demonstrate positive scaling laws for the first time in visual navigation tasks. Consequently, our model advances state-of-the-art performance on ImageNav from 54.2% to 82.0% success and performs competitively against concurrent state-of-art on ObjectNav with success rate of 64.0% vs. 65.0%. Overall, this work does not present a fundamentally new approach, but rather recommendations for training a general-purpose architecture that achieves state-of-art performance today and could serve as a strong baseline for future methods.
我们提出了一种单神经网络架构,由任务无关组件(ViTs、卷积和LSTMs)组成,能够在图像导航任务(“前往这幅画”)和对象导航任务(“找到椅子”)中实现最先进的结果,而不需要特定的任务模块,如物体检测、分割、映射或规划模块。这种通用方法具有设计简单、可用计算积极扩展、同时适用于多个任务的优点。我们的工作基于最近自监督学习(SSL)在预训练视觉转换器(ViT)上取得成功的经验。然而,虽然卷积神经网络的训练食谱已经成熟和稳健,但ViTs的训练食谱则具有条件性和脆性,而针对视觉导航的ViTs尚未完全被发现。具体来说,我们发现,单纯的ViTs在视觉导航任务中并未优于ResNets。我们建议使用在ViT块表示上运行的自由压缩层来保留空间信息,并与政策训练改进一起使用。这些改进使我们能够首次在视觉导航任务中展示积极扩展定律。因此,我们的模型在ImageNav任务上提高了成功率从54.2%到82.0%,在ObjectNav任务上与 concurrent 先进方法竞争,成功率为64.0%对65.0%。总体而言,这项工作并没有提出本质上新的方法,而是建议训练一个通用的架构,今天实现最先进的性能,可能成为未来方法的强大基准。
https://arxiv.org/abs/2303.07798
Birth asphyxia is a major newborn mortality problem in low-resource countries. International guideline provides treatment recommendations; however, the importance and effect of the different treatments are not fully explored. The available data is collected in Tanzania, during newborn resuscitation, for analysis of the resuscitation activities and the response of the newborn. An important step in the analysis is to create activity timelines of the episodes, where activities include ventilation, suction, stimulation etc. Methods: The available recordings are noisy real-world videos with large variations. We propose a two-step process in order to detect activities possibly overlapping in time. The first step is to detect and track the relevant objects, like bag-mask resuscitator, heart rate sensors etc., and the second step is to use this information to recognize the resuscitation activities. The topic of this paper is the first step, and the object detection and tracking are based on convolutional neural networks followed by post processing. Results: The performance of the object detection during activities were 96.97 % (ventilations), 100 % (attaching/removing heart rate sensor) and 75 % (suction) on a test set of 20 videos. The system also estimate the number of health care providers present with a performance of 71.16 %. Conclusion: The proposed object detection and tracking system provides promising results in noisy newborn resuscitation videos. Significance: This is the first step in a thorough analysis of newborn resuscitation episodes, which could provide important insight about the importance and effect of different newborn resuscitation activities
出生窒息是低资源国家中新生儿死亡的主要原因。国际指南提供了治疗建议,但不同的治疗方法的重要性和效果并未得到充分探讨。在坦桑尼亚的新生儿复苏期间,可用数据收集来分析复苏活动和新生儿的反应。分析的一个重要步骤是创建活动时间线图,其中包括呼吸、吸氧、刺激等活动。方法:可用的视频录制质量较差,存在大量差异。我们提出了一种两步过程,以检测可能同时发生的活动。第一步是检测和跟踪相关物体,如塑料袋口罩复苏器、心率传感器等。第二步是使用这些信息识别复苏活动。本文的主题是第一步,对象检测和跟踪基于卷积神经网络,然后进行 post 处理。结果:在活动中的对象检测性能在 96.97 %(呼吸)、100 %(插入/移除心率传感器)和 75 %(吸氧)的情况下表现良好。系统还估计了有 71.16 % 表现良好的医疗专业人员数量。结论:提出的对象检测和跟踪系统在噪音突出的新生儿复苏视频中表现出良好的结果。意义:这是深入分析新生儿复苏事件的第一步,这可能提供有关不同新生儿复苏活动重要性和效果的重要见解。
https://arxiv.org/abs/2303.07790
We conducted a human subject study of named entity recognition on a noisy corpus of conversational music recommendation queries, with many irregular and novel named entities. We evaluated the human NER linguistic behaviour in these challenging conditions and compared it with the most common NER systems nowadays, fine-tuned transformers. Our goal was to learn about the task to guide the design of better evaluation methods and NER algorithms. The results showed that NER in our context was quite hard for both human and algorithms under a strict evaluation schema; humans had higher precision, while the model higher recall because of entity exposure especially during pre-training; and entity types had different error patterns (e.g. frequent typing errors for artists). The released corpus goes beyond predefined frames of interaction and can support future work in conversational music recommendation.
我们对对话音乐推荐查询中嘈杂的文本库进行了人类 subjects 研究,其中包含许多不规则且新颖的命名实体。我们评估了在这些挑战性条件下人类 NER 语言行为,并将其与当前最普遍的 NER 系统,即优化的Transformers进行比较。我们的目标是了解任务,以指导更好的评估方法和 NER 算法的设计。结果显示,在我们这种情况下,NER 对人类和算法都非常困难。人类具有更高的精度,而模型因为实体暴露尤其是在预处理期间而具有更高的召回率。实体类型有不同的错误模式(例如,艺术家频繁地打字错误)。释放的文本库超越了预先定义的对话交互框架,可以支持未来在对话音乐推荐方面的工作。
https://arxiv.org/abs/2303.06944
Users in consumption domains, like music, are often able to more efficiently provide preferences over a set of items (e.g. a playlist or radio) than over single items (e.g. songs). Unfortunately, this is an underexplored area of research, with most existing recommendation systems limited to understanding preferences over single items. Curating an item set exponentiates the search space that recommender systems must consider (all subsets of items!): this motivates conversational approaches-where users explicitly state or refine their preferences and systems elicit preferences in natural language-as an efficient way to understand user needs. We call this task conversational item set curation and present a novel data collection methodology that efficiently collects realistic preferences about item sets in a conversational setting by observing both item-level and set-level feedback. We apply this methodology to music recommendation to build the Conversational Playlist Curation Dataset (CPCD), where we show that it leads raters to express preferences that would not be otherwise expressed. Finally, we propose a wide range of conversational retrieval models as baselines for this task and evaluate them on the dataset.
消费领域(如音乐)的用户往往能够更有效地提供对一组物品(如播放列表或电台)的偏好,而不是对单个物品(如歌曲)的偏好。不幸的是,这是一个未被深入研究的领域,大多数现有推荐系统局限于理解单个物品的偏好。对物品集进行排序可以指数级扩展推荐系统必须考虑的范围(所有物品的子集!):这激励了对话式方法——用户 explicitly state或 refine他们的偏好,并且系统用自然语言 elicit他们的偏好——作为理解用户需求的高效方法。我们将这个任务称为对话式物品集排序,并提出了一种新的数据采集方法,能够高效地在对话式环境中收集关于物品集的真实偏好,方法是通过观察 item 级和 set 级反馈。我们将其应用于音乐推荐,构建了一个对话式播放列表排序数据集(CPCD),并证明了这种方法会导致 raters 表达他们以前不会表达过的偏好。最后,我们提出了广泛的对话式检索模型作为这个任务的基础,并在数据集上评估它们。
https://arxiv.org/abs/2303.06791
Off-policy learning, referring to the procedure of policy optimization with access only to logged feedback data, has shown importance in various real-world applications, such as search engines, recommender systems, and etc. While the ground-truth logging policy, which generates the logged data, is usually unknown, previous work simply takes its estimated value in off-policy learning, ignoring both high bias and high variance resulted from such an estimator, especially on samples with small and inaccurately estimated logging probabilities. In this work, we explicitly model the uncertainty in the estimated logging policy and propose a Uncertainty-aware Inverse Propensity Score estimator (UIPS) for improved off-policy learning. Experiment results on synthetic and three real-world recommendation datasets demonstrate the advantageous sample efficiency of the proposed UIPS estimator against an extensive list of state-of-the-art baselines.
非自主学习(Off-policy learning)是指仅使用记录的反馈数据进行策略优化的过程,在各种实际应用程序中,如搜索引擎、推荐系统和等等,表现出了重要性。虽然生成记录数据的真实的日志决策程序通常未知,但以前的工作只是将其在非自主学习中估计值,而忽视了由这种估计器带来的高偏差和高方差,特别是小且不准确估计日志概率样本的影响。在本文中,我们 explicitly model 估计的日志决策的不确定性,并提出了一种不确定性aware的逆概率权重估计器(UIPS),以改善非自主学习。在模拟和三个真实的推荐数据集上的实验结果证明了提出的UIPS估计器相对于广泛的先进基准线的优越样本效率。
https://arxiv.org/abs/2303.06389
Sequential recommender systems aim to predict users' next interested item given their historical interactions. However, a long-standing issue is how to distinguish between users' long/short-term interests, which may be heterogeneous and contribute differently to the next recommendation. Existing approaches usually set pre-defined short-term interest length by exhaustive search or empirical experience, which is either highly inefficient or yields subpar results. The recent advanced transformer-based models can achieve state-of-the-art performances despite the aforementioned issue, but they have a quadratic computational complexity to the length of the input sequence. To this end, this paper proposes a novel sequential recommender system, AutoMLP, aiming for better modeling users' long/short-term interests from their historical interactions. In addition, we design an automated and adaptive search algorithm for preferable short-term interest length via end-to-end optimization. Through extensive experiments, we show that AutoMLP has competitive performance against state-of-the-art methods, while maintaining linear computational complexity.
Sequential recommender systems旨在根据用户的历史交互预测他们感兴趣的下一件物品。然而,一个长期存在的问题是如何区分用户的长期兴趣和短期兴趣,这些兴趣可能不同并且为下一份推荐贡献不同。现有的方法通常通过搜索或经验来设置预定义的短期兴趣长度,这种方法要么非常低效,要么产生较差的结果。最近的先进的Transformer模型尽管克服了上述问题,但仍然表现出最先进的性能,但它们输入序列的长度具有quadratic的计算复杂度。因此,本文提出了一种新型的Sequential recommender system,名为AutoMLP,旨在更好地从用户的历史交互中建模长期兴趣和短期兴趣。此外,我们设计了一种自动化和自适应的搜索算法,通过端到端优化,以选择更好的短期兴趣长度。通过广泛的实验,我们表明AutoMLP与最先进的方法相比具有竞争力性能,同时保持线性计算复杂度。
https://arxiv.org/abs/2303.06337
Multi-scenario & multi-task learning has been widely applied to many recommendation systems in industrial applications, wherein an effective and practical approach is to carry out multi-scenario transfer learning on the basis of the Mixture-of-Expert (MoE) architecture. However, the MoE-based method, which aims to project all information in the same feature space, cannot effectively deal with the complex relationships inherent among various scenarios and tasks, resulting in unsatisfactory performance. To tackle the problem, we propose a Hierarchical information extraction Network (HiNet) for multi-scenario and multi-task recommendation, which achieves hierarchical extraction based on coarse-to-fine knowledge transfer scheme. The multiple extraction layers of the hierarchical network enable the model to enhance the capability of transferring valuable information across scenarios while preserving specific features of scenarios and tasks. Furthermore, a novel scenario-aware attentive network module is proposed to model correlations between scenarios explicitly. Comprehensive experiments conducted on real-world industrial datasets from Meituan Meishi platform demonstrate that HiNet achieves a new state-of-the-art performance and significantly outperforms existing solutions. HiNet is currently fully deployed in two scenarios and has achieved 2.87% and 1.75% order quantity gain respectively.
多场景和多任务学习已经被广泛应用于工业应用中的许多推荐系统,其中一种有效且实用的方法是基于专家混合(MoE)架构进行多场景迁移学习。然而,基于MoE的方法旨在将所有信息放在相同的特征空间中,无法有效地处理各种场景和任务之间的复杂关系,导致性能不佳。为了解决这个问题,我们提出了一种Hierarchical information extraction Network(HiNet)多场景和多任务推荐,该方法基于细粒度知识传输方案实现Hierarchical提取。HiNet的多个提取层使模型能够增强在不同场景间传输有价值的信息的能力,同时保留场景和任务的特定特征。此外,我们提出了一种名为场景 aware attention network module的新模块,以 explicitly Modeling场景间的关系。从 Meituan Meishi 平台获取的真实工业数据集的 comprehensive 实验表明,HiNet实现了一种新的高性能,并显著优于现有的解决方案。HiNet目前完全部署在两个场景中,并分别实现了2.87%和1.75%的订单量增益。
https://arxiv.org/abs/2303.06095
Displaying confidence scores in human-AI interaction has been shown to help build trust between humans and AI systems. However, most existing research uses only the confidence score as a form of communication. As confidence scores are just another model output, users may want to understand why the algorithm is confident to determine whether to accept the confidence score. In this paper, we show that counterfactual explanations of confidence scores help study participants to better understand and better trust a machine learning model's prediction. We present two methods for understanding model confidence using counterfactual explanation: (1) based on counterfactual examples; and (2) based on visualisation of the counterfactual space. Both increase understanding and trust for study participants over a baseline of no explanation, but qualitative results show that they are used quite differently, leading to recommendations of when to use each one and directions of designing better explanations.
在人类-AI交互中显示信心得分已经被证明可以帮助建立人类和AI系统之间的信任。然而,大多数现有研究仅使用信心得分作为通信方式。信心得分只是另一个模型输出,因此用户可能希望理解算法为什么有信心来确定是否接受信心得分。在本文中,我们表明信心得分的反事实解释可以帮助研究参与者更好地理解机器学习模型的预测。我们提出了两种方法,使用反事实解释来理解模型信心:(1)基于反事实例子;(2)基于可视化的反事实空间。两种方法都增加了研究参与者在没有解释的情况下的理解和信任,但定性结果表明它们使用 quite differently,导致建议如何使用每个方法和方向设计更好的解释。
https://arxiv.org/abs/2303.05729
Conversational recommender systems (CRSs) are improving rapidly, according to the standard recommendation accuracy metrics. However, it is essential to make sure that these systems are robust in interacting with users including regular and malicious users who want to attack the system by feeding the system modified input data. In this paper, we propose an adversarial evaluation scheme including four scenarios in two categories and automatically generate adversarial examples to evaluate the robustness of these systems in the face of different input data. By executing these adversarial examples we can compare the ability of different conversational recommender systems to satisfy the user's preferences. We evaluate three CRSs by the proposed adversarial examples on two datasets. Our results show that none of these systems are robust and reliable to the adversarial examples.
对话推荐系统(CRS)根据标准推荐准确性指标正在迅速改进。然而,确保这些系统在与用户互动时是鲁棒的,包括正常和恶意的用户,他们通过向系统修改输入数据来攻击系统非常重要。在本文中,我们提出了一种对抗性评估方案,包括两个类别的四个场景,并自动生成对抗性示例来评估这些系统的在不同输入数据下的鲁棒性。通过执行这些对抗性示例,我们可以比较不同对话推荐系统满足用户偏好的能力。我们使用 proposed 对抗性示例评估了三个 CRS 在不同数据集上的表现。我们的结果显示,这些方法都没有对对抗性示例表现出鲁棒性和可靠性。
https://arxiv.org/abs/2303.05575
Food recognition has a wide range of applications, such as health-aware recommendation and self-service restaurants. Most previous methods of food recognition firstly locate informative regions in some weakly-supervised manners and then aggregate their features. However, location errors of informative regions limit the effectiveness of these methods to some extent. Instead of locating multiple regions, we propose a Progressive Self-Distillation (PSD) method, which progressively enhances the ability of network to mine more details for food recognition. The training of PSD simultaneously contains multiple self-distillations, in which a teacher network and a student network share the same embedding network. Since the student network receives a modified image from its teacher network by masking some informative regions, the teacher network outputs stronger semantic representations than the student network. Guided by such teacher network with stronger semantics, the student network is encouraged to mine more useful regions from the modified image by enhancing its own ability. The ability of the teacher network is also enhanced with the shared embedding network. By using progressive training, the teacher network incrementally improves its ability to mine more discriminative regions. In inference phase, only the teacher network is used without the help of the student network. Extensive experiments on three datasets demonstrate the effectiveness of our proposed method and state-of-the-art performance.
食品识别具有广泛的应用,例如健康意识的推荐和自助餐厅。以往的食品识别方法首先通过一些弱监督的方式找到 informative 区域,然后将它们的特征聚合起来。然而, informative 区域的位置错误在一定程度上限制了这些方法的有效性。我们提出了一种渐进式的自我蒸馏方法(PSD),该方法渐进地增强网络在食品识别中挖掘更多细节的能力。在 PSD 的训练过程中,同时包含多个自我蒸馏,其中老师网络和学生网络共享相同的嵌入网络。由于学生网络通过掩盖一些 informative 区域从老师网络接收到了修改的图像,老师网络输出比学生网络更强的语义表示。受到更强语义的老师网络的指导,学生网络被鼓励从修改的图像中挖掘更多的有用区域,并增强自身的能力。通过渐进式的训练,老师网络逐步改进了挖掘更 discriminative 区域的能力。在推理阶段,仅使用老师网络,而不需要学生网络的帮助。对三个数据集进行的广泛实验证明了我们提出的方法和最先进的性能。
https://arxiv.org/abs/2303.05073
Recent work has utilised knowledge-aware approaches to natural language understanding, question answering, recommendation systems, and other tasks. These approaches rely on well-constructed and large-scale knowledge graphs that can be useful for many downstream applications and empower knowledge-aware models with commonsense reasoning. Such knowledge graphs are constructed through knowledge acquisition tasks such as relation extraction and knowledge graph completion. This work seeks to utilise and build on the growing body of work that uses findings from the field of natural language processing (NLP) to extract knowledge from text and build knowledge graphs. The focus of this research project is on how we can use transformer-based approaches to extract and contextualise event information, matching it to existing ontologies, to build a comprehensive knowledge of graph-based event representations. Specifically, sub-event extraction is used as a way of creating sub-event-aware event representations. These event representations are then further enriched through fine-grained location extraction and contextualised through the alignment of historically relevant quotes.
最近的工作利用了知识 aware 的方法来实现自然语言理解、回答问题、推荐系统和其他任务。这些方法依赖于构建良好且大规模的知识图,对于许多后续应用非常有用,并利用常识推理使知识 aware 模型具有能力。这些知识图是通过关系提取和知识图完成的知识获取任务构建的。本研究的目标是利用和建立使用自然语言处理领域发现从文本中提取知识并构建知识图的方法,以建立基于图的事件表示的全面知识。具体而言,子事件提取被用来创建子事件aware的事件表示。这些事件表示随后通过精细的位置提取和通过历史相关引用的对齐进行进一步丰富。
https://arxiv.org/abs/2303.04794