As the deployment of pre-trained language models (PLMs) expands, pressing security concerns have arisen regarding the potential for malicious extraction of training data, posing a threat to data privacy. This study is the first to provide a comprehensive survey of training data extraction from PLMs. Our review covers more than 100 key papers in fields such as natural language processing and security. First, preliminary knowledge is recapped and a taxonomy of various definitions of memorization is presented. The approaches for attack and defense are then systemized. Furthermore, the empirical findings of several quantitative studies are highlighted. Finally, future research directions based on this review are suggested.
随着预训练语言模型(PLM)的部署扩展,有关恶意提取训练数据的潜在安全关切已经浮现,这构成了数据隐私的威胁。这项研究是第一次全面调查PLM从记忆模型中提取训练数据的。我们的综述涵盖了自然语言处理和安全问题领域的超过100篇论文。首先,回顾了先前的知识,并呈现了记忆的各种定义的分类。攻击和防御的方法随后系统化了。此外, several 定量研究的实证结论被重点强调。最后,基于这个综述提出了未来的研究方向。
https://arxiv.org/abs/2305.16157
Currently, video behavior recognition is one of the most foundational tasks of computer vision. The 2D neural networks of deep learning are built for recognizing pixel-level information such as images with RGB, RGB-D, or optical flow formats, with the current increasingly wide usage of surveillance video and more tasks related to human action recognition. There are increasing tasks requiring temporal information for frames dependency analysis. The researchers have widely studied video-based recognition rather than image-based(pixel-based) only to extract more informative elements from geometry tasks. Our current related research addresses multiple novel proposed research works and compares their advantages and disadvantages between the derived deep learning frameworks rather than machine learning frameworks. The comparison happened between existing frameworks and datasets, which are video format data only. Due to the specific properties of human actions and the increasingly wide usage of deep neural networks, we collected all research works within the last three years between 2020 to 2022. In our article, the performance of deep neural networks surpassed most of the techniques in the feature learning and extraction tasks, especially video action recognition.
目前,视频行为识别是计算机视觉中最基本的任务之一。深度学习的二维神经网络是为了识别像素级别的信息,例如RGB、RGB-D或光学流格式的图像。随着监控视频的普及以及与人类行为识别相关的更多任务的出现,需要对帧依赖分析提供更多信息的越来越多的任务。研究人员广泛研究了基于视频而不是基于图像(像素)的识别,仅从几何任务中获取更 informative 的元素。我们当前的相关研究涉及多个全新的 proposes 的研究项目,并比较它们从 derived 深度学习框架中的优点和缺点,而不是从机器学习框架中的优点和缺点。比较的是现有的框架和数据集,它们仅适用于视频格式的数据。由于人类行为的特殊性质以及深度学习神经网络的日益普及,我们在2020年至2022年这三个三年内收集了所有研究项目。在我们的文章中,深度学习网络的性能在特征学习和提取任务中超过了许多技术,特别是视频行为识别。
https://arxiv.org/abs/2305.15692
Problem definition: Access to accurate predictions of patients' outcomes can enhance medical staff's decision-making, which ultimately benefits all stakeholders in the hospitals. A large hospital network in the US has been collaborating with academics and consultants to predict short-term and long-term outcomes for all inpatients across their seven hospitals. Methodology/results: We develop machine learning models that predict the probabilities of next 24-hr/48-hr discharge and intensive care unit transfers, end-of-stay mortality and discharge dispositions. All models achieve high out-of-sample AUC (75.7%-92.5%) and are well calibrated. In addition, combining 48-hr discharge predictions with doctors' predictions simultaneously enables more patient discharges (10%-28.7%) and fewer 7-day/30-day readmissions ($p$-value $<0.001$). We implement an automated pipeline that extracts data and updates predictions every morning, as well as user-friendly software and a color-coded alert system to communicate these patient-level predictions (alongside explanations) to clinical teams. Managerial implications: Since we have been gradually deploying the tool, and training medical staff, over 200 doctors, nurses, and case managers across seven hospitals use it in their daily patient review process. We observe a significant reduction in the average length of stay (0.67 days per patient) following its adoption and anticipate substantial financial benefits (between \$55 and \$72 million annually) for the healthcare system.
问题定义:获取患者结果准确的预测能够增强医疗工作人员的决策能力,最终造福于医院的所有利益相关者。在美国,一个大型医院网络与学术界和顾问合作,预测所有住院病人在7家医院短期和长期结果。方法/结果:我们开发机器学习模型,预测未来24小时/48小时出院和重症监护室转诊、出院死亡率和出院处理方案的概率。所有模型都实现了高样本AUC(75.7%-92.5%)且校准良好。此外,同时结合医生的预测,能够实现更多的患者出院(10%-28.7%),更少的7日/30日 readmissions( $p$-value <0.001)。我们实现了每天早上自动提取数据并更新预测的自动化管道,并开发了易于使用的软件和色彩编码的警报系统,以向临床团队传达这些患者级别的预测(伴随解释)。管理影响:自我们开始逐渐部署工具并培训医疗工作人员以来,超过200名医生、护士和病例管理师在7家医院每天的 patient 审查过程中使用它。我们观察到随着它的采用,平均住院时间(每个病人每天0.67天)显著减少,并预计为医疗系统带来巨大的财务好处(每年介于55万到720万美元之间)。
https://arxiv.org/abs/2305.15629
Humans have developed the capability to teach relevant aspects of new or adapted tasks to a social peer with very few task demonstrations by making use of scaffolding strategies that leverage prior knowledge and importantly prior joint experience to yield a joint understanding and a joint execution of the required steps to solve the task. This process has been discovered and analyzed in parent-infant interaction and constitutes a ``co-construction'' as it allows both, the teacher and the learner, to jointly contribute to the task. We propose to focus research in robot interactive learning on this co-construction process to enable robots to learn from non-expert users in everyday situations. In the following, we will review current proposals for interactive task learning and discuss their main contributions with respect to the entailing interaction. We then discuss our notion of co-construction and summarize research insights from adult-child and human-robot interactions to elucidate its nature in more detail. From this overview we finally derive research desiderata that entail the dimensions architecture, representation, interaction and explainability.
人类已经开发出了能力,用少量的任务演示向社交同龄人传授新或适应的任务相关的方面,而这需要利用利用先前的知识并重要的是先前的联合经验来产生共同理解和共同执行所需的步骤,以解决问题。这一过程在父母与婴儿的互动中被发现和分析,构成了一个“共构建”过程,因为它允许老师和学习者共同为任务做出贡献。我们建议将机器人交互学习的研究重点集中在这个共构建过程中,使机器人能够在日常情况下从非专家用户学习。接下来,我们将审查当前关于交互任务学习的提议,并讨论它们对于涉及互动的主要贡献。然后我们将讨论我们的共构建概念,并总结从成人-儿童和人类-机器人互动中提取的研究洞察力,以更详细地阐明其性质。从这一概述中,我们最终推导出研究目标,涉及架构、表示、互动和解释性等方面。
https://arxiv.org/abs/2305.15535
Large language models of artificial intelligence (AI) such as ChatGPT find remarkable but controversial applicability in science and research. This paper reviews epistemological challenges, ethical and integrity risks in science conduct. This is with the aim to lay new timely foundations for a high-quality research ethics review in the era of AI. The role of AI language models as a research instrument and subject is scrutinized along with ethical implications for scientists, participants and reviewers. Ten recommendations shape a response for a more responsible research conduct with AI language models.
大型人工智能(AI)语言模型,如聊天机器人GPT-3,在科学和研究中表现出引人注目但具有争议性的适用性。本文综述了科学实践中的形而上学挑战、伦理和道德风险。旨在为在AI时代建立高质量的研究伦理审查建立新的及时基础。AI语言模型作为研究工具和主题的角色受到审视,并考虑对科学家、参与者和审稿人的伦理影响。提出了10项建议,以应对使用AI语言模型更加负责任的研究行为。
https://arxiv.org/abs/2305.15299
Automatic literature review generation is one of the most challenging tasks in natural language processing. Although large language models have tackled literature review generation, the absence of large-scale datasets has been a stumbling block to the progress. We release SciReviewGen, consisting of over 10,000 literature reviews and 690,000 papers cited in the reviews. Based on the dataset, we evaluate recent transformer-based summarization models on the literature review generation task, including Fusion-in-Decoder extended for literature review generation. Human evaluation results show that some machine-generated summaries are comparable to human-written reviews, while revealing the challenges of automatic literature review generation such as hallucinations and a lack of detailed information. Our dataset and code are available at this https URL.
自动文献生成是自然语言处理中最具挑战性的任务之一。尽管大型语言模型已经成功地解决了文献生成问题,但缺乏大规模数据集一直是进展的阻碍。我们发布了SciReviewGen数据集,其中包含超过10,000篇文献和690,000篇论文引用的文献。基于该数据集,我们评估了最近的Transformer基于摘要生成模型,包括文献生成中的解码融合扩展。人类评估结果显示,一些机器生成的摘要与人类撰写的文献评论相当,同时也揭示了自动文献生成面临的挑战,如幻觉和缺乏详细信息。我们的数据和代码可在这个https URL上获取。
https://arxiv.org/abs/2305.15186
Reinforcement Learning (RL) is a powerful machine learning paradigm that has been applied in various fields such as robotics, natural language processing and game playing achieving state-of-the-art results. Targeted to solve sequential decision making problems, it is by design able to learn from experience and therefore adapt to changing dynamic environments. These capabilities make it a prime candidate for controlling and optimizing complex processes in industry. The key to fully exploiting this potential is the seamless integration of RL into existing industrial systems. The industrial communication standard Open Platform Communications UnifiedArchitecture (OPC UA) could bridge this gap. However, since RL and OPC UA are from different fields,there is a need for researchers to bridge the gap between the two technologies. This work serves to bridge this gap by providing a brief technical overview of both technologies and carrying out a semi-exhaustive literature review to gain insights on how RL and OPC UA are applied in combination. With this survey, three main research topics have been identified, following the intersection of RL with OPC UA. The results of the literature review show that RL is a promising technology for the control and optimization of industrial processes, but does not yet have the necessary standardized interfaces to be deployed in real-world scenarios with reasonably low effort.
强化学习(RL)是一种强大的机器学习范式,已经应用于各种领域,如机器人、自然语言处理和游戏玩,取得了最先进的结果。其目标是解决Sequential决策问题,因此可以设计从经验中学习,因此适应不断变化的动态环境。这些能力使其成为工业控制和优化复杂过程的首选。要充分利用这种潜力,关键是要将RL无缝融入现有的工业系统。工业通信标准Open Platform Communications Unified Architecture(OPC UA)可以填补这个差距。然而,由于RL和OPC UA来自不同领域,研究人员需要填补这两个技术之间的差距。这项工作旨在填补这个差距,通过提供两个技术的简要技术概述和进行半充分的文献综述来了解如何将RL和 OPC UA结合起来。通过这份调查,三个主要研究主题被识别,随着RL和 OPC UA之间的交集。文献综述的结果表明,RL是工业过程控制和优化的一种有前途的技术,但还不具备必要的标准化接口,以便以合理的较低 effort 在现实世界场景中部署。
https://arxiv.org/abs/2305.15113
Non-adversarial robustness, also known as natural robustness, is a property of deep learning models that enables them to maintain performance even when faced with distribution shifts caused by natural variations in data. However, achieving this property is challenging because it is difficult to predict in advance the types of distribution shifts that may occur. To address this challenge, researchers have proposed various approaches, some of which anticipate potential distribution shifts, while others utilize knowledge about the shifts that have already occurred to enhance model generalizability. In this paper, we present a brief overview of the most recent techniques for improving the robustness of computer vision methods, as well as a summary of commonly used robustness benchmark datasets for evaluating the model's performance under data distribution shifts. Finally, we examine the strengths and limitations of the approaches reviewed and identify general trends in deep learning robustness improvement for computer vision.
非自适应鲁棒性(也被称为自然鲁棒性)是深度学习模型的一种属性,使其能够在面临数据自然变异的情况下保持性能。然而,实现这种属性是挑战性的,因为难以在 advance 上预测可能发生的分布变异类型。为了应对这种挑战,研究人员提出了各种方法,其中一些方法能够预见潜在的分布变异,而另一些方法则利用已经发生的变异知识来提高模型的泛化能力。在本文中,我们将简要介绍最近用于提高计算机视觉方法鲁棒性的技术,并摘要介绍常用的鲁棒性基准数据集,用于评估模型在数据分布变异下的性能。最后,我们将审查所综述的方法的优点和局限性,并识别深度学习计算机视觉鲁棒性改进的一般趋势。
https://arxiv.org/abs/2305.14986
Many real-world applications require surfacing extracted snippets to users, whether motivated by assistive tools for literature surveys or document cross-referencing, or needs to mitigate and recover from model generated inaccuracies., Yet, these passages can be difficult to consume when divorced from their original document context. In this work, we explore the limits of LLMs to perform decontextualization of document snippets in user-facing scenarios, focusing on two real-world settings - question answering and citation context previews for scientific documents. We propose a question-answering framework for decontextualization that allows for better handling of user information needs and preferences when determining the scope of rewriting. We present results showing state-of-the-art LLMs under our framework remain competitive with end-to-end approaches. We also explore incorporating user preferences into the system, finding our framework allows for controllability.
许多现实世界的应用需要将提取的片段呈现给用户,无论是从文献调研的辅助工具还是文档交叉引用的动机出发,或者需要减轻和恢复模型产生的不准确之处。然而,将这些片段从原始文档上下文中分离开来可能会使其难以被消费。在这项工作中,我们探索了LLM在用户面对的场景下进行文档片段脱上下文化的极限,重点关注了两个实际场景——科学文档的问题回答和引用上下文预览。我们提出了一种问答框架,用于脱上下文化,以便更好地处理用户的信息和偏好,在确定改写范围时。我们呈现了结果,表明我们框架下的LLM在性能方面仍然与端到端方法竞争。我们还探索了将用户偏好融入系统中,发现我们的框架可以实现控制。
https://arxiv.org/abs/2305.14772
We propose attribute-aware multimodal entity linking, where the input is a mention described with a text and image, and the goal is to predict the corresponding target entity from a multimodal knowledge base (KB) where each entity is also described with a text description, a visual image and a set of attributes and values. To support this research, we construct AMELI, a large-scale dataset consisting of 18,472 reviews and 35,598 products. To establish baseline performance on AMELI, we experiment with the current state-of-the-art multimodal entity linking approaches and our enhanced attribute-aware model and demonstrate the importance of incorporating the attribute information into the entity linking process. To be best of our knowledge, we are the first to build benchmark dataset and solutions for the attribute-aware multimodal entity linking task. Datasets and codes will be made publicly available.
我们提出了属性 aware 多媒质实体链接方案,输入为以文本和图像描述描述的提及,目标预测对应目标实体从多媒质知识库 (KB) 中预测,其中每个实体也以文本描述、视觉图像和一组属性值描述。为了支持这项研究,我们构建了一个名为 AMelI 的大型数据集,其中包括 18,472 条评论和 35,598 个产品。为了建立 AMelI 的基线表现,我们实验了当前最先进的多媒质实体链接方法和我们增强的属性 aware 模型,并证明了将属性信息融入实体链接过程的重要性。据我们所知,我们是第一个构建属性 aware 多媒质实体链接基准数据和方法的人。数据集和代码将公开可用。
https://arxiv.org/abs/2305.14725
The past decade has witnessed many great successes of machine learning (ML) and deep learning (DL) applications in agricultural systems, including weed control, plant disease diagnosis, agricultural robotics, and precision livestock management. Despite tremendous progresses, one downside of such ML/DL models is that they generally rely on large-scale labeled datasets for training, and the performance of such models is strongly influenced by the size and quality of available labeled data samples. In addition, collecting, processing, and labeling such large-scale datasets is extremely costly and time-consuming, partially due to the rising cost in human labor. Therefore, developing label-efficient ML/DL methods for agricultural applications has received significant interests among researchers and practitioners. In fact, there are more than 50 papers on developing and applying deep-learning-based label-efficient techniques to address various agricultural problems since 2016, which motivates the authors to provide a timely and comprehensive review of recent label-efficient ML/DL methods in agricultural applications. To this end, we first develop a principled taxonomy to organize these methods according to the degree of supervision, including weak supervision (i.e., active learning and semi-/weakly- supervised learning), and no supervision (i.e., un-/self- supervised learning), supplemented by representative state-of-the-art label-efficient ML/DL methods. In addition, a systematic review of various agricultural applications exploiting these label-efficient algorithms, such as precision agriculture, plant phenotyping, and postharvest quality assessment, is presented. Finally, we discuss the current problems and challenges, as well as future research directions. A well-classified paper list can be accessed at this https URL.
过去十年中,机器学习(ML)和深度学习(DL)在农业系统中取得了许多巨大成功,包括除草控制、植物疾病诊断、农业机器人和精确猪群管理。尽管取得了巨大的进展,但这些ML/DL模型的一个缺点是,它们通常依赖于大规模标记数据集进行训练,而这些模型的性能受到可用标记数据样本的大小和质量的强烈影响。此外,收集、处理和标记这种大规模标记数据集非常昂贵和时间消耗,部分原因是人类劳动成本的上升。因此,开发用于农业应用的标记高效的ML/DL方法受到了研究人员和实践者的广泛兴趣。事实上,自2016年以来,已有超过50篇论文探讨了开发和应用基于深度学习的标记高效的技术来解决各种农业问题,这激励作者们及时全面审查最近在农业应用中标记高效的ML/DL方法。为此,我们首先开发了原则性的分类器,以按照监督程度将这些方法进行分类,包括弱监督(即积极学习和半监督/弱监督学习)和无监督(即无监督学习和自我监督学习),并补充了代表性的标记高效的ML/DL方法。此外,对各种农业应用利用这些标记高效的算法进行系统性综述,例如精确农业、植物分类和收获品质评估,也进行了介绍。最后,我们讨论了当前问题和挑战,以及未来的研究方向。一个分类良好的论文列表可以在这个https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&uact=8&ved=2ahUKEwiXj-v7z6pmAhVhu88KHeAIVQQFjAAegQIARAC&url=https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&uact=8&ved=2ahUKEwiXj-v7z6pmAhVhu88KHeAIVQQFjAAegQIARAC&url=https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&uact=8&ved=2ahUKEwiXj-v7z6pmAhVhu88KHeAIVQQFjAAegQIARAC&url=https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&uact=8&ved=2ahUKEwiXj-v7z6pmAhVhu88KHeAIVQQFjAAegQIARAC&url=https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&uact=8&ved=2ahUKEwiXj-v7z6pmAhVhu88KHeAIVQQFjAAegQIARAC&url=https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&uact=8&ved=2ahUKEwiXj-v7z6pmAhVhu88KHeAIVQQFjAAegQIARAC&url=https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&uact=8&ved=2ahUKEwiXj-v7z6pmAhVhu88KHeAIVQQFjAAegQIARAC&url=https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&uact=8&ved=2ahUKEwiXj-v7z6pmAhVhu88KHeAIVQQFjAAegQIARAC&url=https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&uact=8&ved=2ahUKEwiXj-v7z6pmAhVhu88KHeAIVQQFjAAegQIARAC&url=https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&uact=8&ved=2ahUKEwiXj-v7z6pmAhVhu88KHeAIVQQFjAAegQIARAC&url=https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&uact=8&ved=2ahUKEwiXj-v7z6pmAhVhu88KHeAIVQQFjAAegQIARAC&url=https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&uact=8&ved=2ahUKEwiXj-v7z6pmAhVhu88KHeAIVQQFjAAegQIARAC&url=https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&uact=8&ved=2ahUKEwiXj-v7z6pmAhVhu88KHeAIVQQFjAAegQIARAC&url=https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&uact=8&ved=2ahUKEwiXj-v7z6pmAhVhu88KHeAIVQQFjAAegQIARAC&url=https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&uact=8&ved=2ahUKEwiXj-v7z6pmAhVhu88KHeAIVQQFjAAegQIARAC&url=https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&uact=8&ved=2ahUKEwiXj-v7z6pmAhVhu88KHeAIVQQFjAAegQIARAC&url=https://www.google.
https://arxiv.org/abs/2305.14691
This survey paper provides a comprehensive review of the use of diffusion models in natural language processing (NLP). Diffusion models are a class of mathematical models that aim to capture the diffusion of information or signals across a network or manifold. In NLP, diffusion models have been used in a variety of applications, such as natural language generation, sentiment analysis, topic modeling, and machine translation. This paper discusses the different formulations of diffusion models used in NLP, their strengths and limitations, and their applications. We also perform a thorough comparison between diffusion models and alternative generative models, specifically highlighting the autoregressive (AR) models, while also examining how diverse architectures incorporate the Transformer in conjunction with diffusion models. Compared to AR models, diffusion models have significant advantages for parallel generation, text interpolation, token-level controls such as syntactic structures and semantic contents, and robustness. Exploring further permutations of integrating Transformers into diffusion models would be a valuable pursuit. Also, the development of multimodal diffusion models and large-scale diffusion language models with notable capabilities for few-shot learning would be important directions for the future advance of diffusion models in NLP.
本调查 paper 提供了对自然语言处理(NLP)中扩散模型的全面综述。扩散模型是一类数学模型,旨在捕捉在网络或多平面上的信息或信号的扩散。在 NLP 中,扩散模型被广泛应用于各种应用,例如自然语言生成、情感分析、主题建模和机器翻译。本文讨论了 NLP 中不同扩散模型的写法、优点和限制,以及它们的应用领域。我们还对扩散模型和替代生成模型进行了彻底比较,特别是突出了自回归(AR)模型,同时研究了不同架构如何与扩散模型结合使用Transformer。与 AR 模型相比,扩散模型在并行生成、文本插值、 token-level 控制(如语法结构和语义内容)和稳健性等方面具有显著优势。探索将Transformer 集成到扩散模型中的进一步可能的组合是非常有价值的追求。此外,发展 multimodal 扩散模型和大型扩散语言模型,具有重要能力,以 few-shot 学习为例,将是扩散模型在 NLP 中未来进步的重要方向。
https://arxiv.org/abs/2305.14671
Opinions in the scientific domain can be divergent, leading to controversy or consensus among reviewers. However, current opinion summarization datasets mostly focus on product review domains, which do not account for this variability under the assumption that the input opinions are non-controversial. To address this gap, we propose the task of scientific opinion summarization, where research paper reviews are synthesized into meta-reviews. To facilitate this task, we introduce a new ORSUM dataset covering 10,989 paper meta-reviews and 40,903 paper reviews from 39 conferences. Furthermore, we propose the Checklist-guided Iterative Introspection (CGI$^2$) approach, which breaks down the task into several stages and iteratively refines the summary under the guidance of questions from a checklist. We conclude that (1) human-written summaries are not always reliable since many do not follow the guideline, and (2) the combination of task decomposition and iterative self-refinement shows promising discussion involvement ability and can be applied to other complex text generation using black-box LLM.
在科学领域中,观点可以各不相同,导致评论者之间产生争议或共识。然而,当前的观点概括数据集大多关注产品评论领域,而忽视了这一差异,假定输入观点是非争议性的。为了解决这个问题,我们提出了科学观点概括的任务,将研究论文的评论合成为综述。为了便于这项工作,我们介绍了一个新数据集ORSUM,涵盖了39次会议中的10,989篇综述和40,903篇研究论文。此外,我们提出了一个 Checklist- guided Iterative Introspection(CGI$^2$)方法,该方法将任务分解成多个阶段,并使用 Checklist 中的问题来迭代地优化摘要。我们的结论是(1)人类撰写的摘要并不总是可靠的,因为许多人没有遵循指南,(2)任务分解和迭代的自我优化组合显示出良好的参与讨论的能力,并可以用于使用黑盒LLM的其他复杂的文本生成任务。
https://arxiv.org/abs/2305.14647
Aspect Sentiment Triplet Extraction (ASTE) is a subtask of Aspect-Based Sentiment Analysis (ABSA) that considers each opinion term, their expressed sentiment, and the corresponding aspect targets. However, existing methods are limited to the in-domain setting with two domains. Hence, we propose a domain-expanded benchmark to address the in-domain, out-of-domain and cross-domain settings. We support the new benchmark by annotating more than 4000 data samples for two new domains based on hotel and cosmetics reviews. Our analysis of five existing methods shows that while there is a significant gap between in-domain and out-of-domain performance, generative methods have a strong potential for domain generalization. Our datasets, code implementation and models are available at this https URL .
Aspect SentimentTriplet Extraction(ASTE)是 aspect-based Sentiment Analysis(ABSA)的一个子任务,它考虑每个观点术语、它们表达的情感以及相应的 aspect 目标。然而,现有的方法局限于两个域内的情况。因此,我们提出了一个域扩展基准,以解决域内、域间和跨域情况。我们支持新的基准,通过为基于酒店和化妆品评论的两个新域标注超过4000个数据样本。我们对五种现有方法进行了分析,表明虽然域内和域间表现之间存在显著差距,生成方法具有域泛化的强大潜力。我们的数据集、代码实现和模型可在 this https:// URL 上获取。
https://arxiv.org/abs/2305.14434
We introduce ZeroSCROLLS, a zero-shot benchmark for natural language understanding over long texts, which contains only test sets, without training or development data. We adapt six tasks from the SCROLLS benchmark, and add four new datasets, including two novel information fusing tasks, such as aggregating the percentage of positive reviews. Using ZeroSCROLLS, we conduct a comprehensive evaluation of both open-source and closed large language models, finding that Claude outperforms ChatGPT, and that GPT-4 achieves the highest average score. However, there is still room for improvement on multiple open challenges in ZeroSCROLLS, such as aggregation tasks, where models struggle to pass the naive baseline. As the state of the art is a moving target, we invite researchers to evaluate their ideas on the live ZeroSCROLLS leaderboard
我们介绍了ZeroSCROLLS,一个针对长文本自然语言理解任务的零样本基准,仅包含测试集,而没有训练或开发数据。我们改编自ScrollS基准的六个任务,并增加了四个新数据集,包括两个新的信息融合任务,例如聚合正面评论的百分比。利用ZeroSCROLLS,我们进行了开源和闭源大型语言模型的全面评估,发现Claude比ChatGPT表现更好,而GPT-4取得了最高的平均得分。然而,在ZeroSCROLLS中的多个开放挑战方面,例如聚合任务,模型往往难以通过简单的基线。由于技术水平是一个不断变化的目标,我们邀请研究人员在实时的ZeroSCROLLS排行榜上评估他们的想法。
https://arxiv.org/abs/2305.14196
Machine learning (ML) systems in natural language processing (NLP) face significant challenges in generalizing to out-of-distribution (OOD) data, where the test distribution differs from the training data distribution. This poses important questions about the robustness of NLP models and their high accuracy, which may be artificially inflated due to their underlying sensitivity to systematic biases. Despite these challenges, there is a lack of comprehensive surveys on the generalization challenge from an OOD perspective in text classification. Therefore, this paper aims to fill this gap by presenting the first comprehensive review of recent progress, methods, and evaluations on this topic. We furth discuss the challenges involved and potential future research directions. By providing quick access to existing work, we hope this survey will encourage future research in this area.
自然语言处理(NLP)中的机器学习(ML)系统在将测试分布与训练数据分布不同的数据进行泛化时面临重大挑战。这提出了对NLP模型稳定性和高精度的重要问题,可能是由于它们的固有对系统偏差敏感性导致的人为夸大。尽管面临着这些挑战,但在文本分类中从OOD角度进行泛化挑战的全面调查缺乏。因此,本文旨在填补这一空缺,并首先提供关于这个话题的最新进展、方法和评估的全面综述。我们最后讨论了所涉及的挑战和未来研究的方向。通过提供快速访问现有工作的渠道,我们希望这 survey 将鼓励该领域的未来研究。
https://arxiv.org/abs/2305.14104
Latest developments in computer hardware, sensor technologies, and artificial intelligence can make virtual reality (VR) and virtual spaces an important part of human everyday life. Eye tracking offers not only a hands-free way of interaction but also the possibility of a deeper understanding of human visual attention and cognitive processes in VR. Despite these possibilities, eye-tracking data also reveal privacy-sensitive attributes of users when it is combined with the information about the presented stimulus. To address these possibilities and potential privacy issues, in this survey, we first cover major works in eye tracking, VR, and privacy areas between the years 2012 and 2022. While eye tracking in the VR part covers the complete pipeline of eye-tracking methodology from pupil detection and gaze estimation to offline use and analyses, as for privacy and security, we focus on eye-based authentication as well as computational methods to preserve the privacy of individuals and their eye-tracking data in VR. Later, taking all into consideration, we draw three main directions for the research community by mainly focusing on privacy challenges. In summary, this survey provides an extensive literature review of the utmost possibilities with eye tracking in VR and the privacy implications of those possibilities.
计算机硬件、传感器技术和人工智能的最新发展可以使得虚拟现实(VR)和虚拟空间成为人类日常生活中的一个重要部分。眼动追踪不仅提供了无触摸交互的方式,而且有可能更深入地理解VR中的人类视觉注意力和认知过程。尽管这些可能性,但眼动追踪数据在与呈现刺激信息结合时也会揭示用户的敏感隐私属性。为了解决这些可能性和潜在的隐私问题,本调查我们首先覆盖了2012年至2022年间眼动追踪、VR和隐私领域的主要工作。在VR部分,眼动追踪涵盖了从瞳孔检测和 gaze 估计到离线使用和分析的完整眼动追踪方法 pipeline。对于隐私和安全,我们重点探讨基于眼的认证以及计算方法,以保护个人及其在VR中的眼动追踪数据的隐私。后来,综合考虑所有因素,我们确定了研究社区的三个主要方向,主要关注隐私挑战。综上所述,本调查提供了在VR中眼动追踪的最大可能性以及这些可能性所涉及的隐私影响的全面文献综述。
https://arxiv.org/abs/2305.14080
Developing and integrating advanced image sensors with novel algorithms in camera systems are prevalent with the increasing demand for computational photography and imaging on mobile platforms. However, the lack of high-quality data for research and the rare opportunity for in-depth exchange of views from industry and academia constrain the development of mobile intelligent photography and imaging (MIPI). With the success of the 1st MIPI Workshop@ECCV 2022, we introduce the second MIPI challenge including four tracks focusing on novel image sensors and imaging algorithms. In this paper, we summarize and review the Nighttime Flare Removal track on MIPI 2023. In total, 120 participants were successfully registered, and 11 teams submitted results in the final testing phase. The developed solutions in this challenge achieved state-of-the-art performance on Nighttime Flare Removal. A detailed description of all models developed in this challenge is provided in this paper. More details of this challenge and the link to the dataset can be found at this https URL .
开发并集成相机系统中的新型高级图像传感器和新型算法正在随着移动设备平台上计算摄影和成像需求的增长而变得越来越普遍。然而,缺乏高质量的研究数据以及 industry 和学术界深入交流的罕见机会限制了移动设备智能摄影和成像(MIPI)的发展。随着第一个 MIPI Workshop@ECCV 2022 的成功,我们介绍了第二个 MIPI 挑战,其中包括四个主题,专注于新型图像传感器和成像算法的开发。在本文中,我们总结了和回顾了 MIPI 2023 夜晚退色去除 track 的数据。总共有 120 名参与者成功地注册了,在最后的测试阶段有 11 个团队提交了结果。在这份挑战中开发的解决方案在夜晚退色去除方面实现了最先进的性能。本文详细介绍了所有模型在这份挑战中开发的详细描述。更多关于这份挑战和数据集的详细信息可以在 this https URL 上找到。
https://arxiv.org/abs/2305.13770
Evaluating multi-document summarization (MDS) quality is difficult. This is especially true in the case of MDS for biomedical literature reviews, where models must synthesize contradicting evidence reported across different documents. Prior work has shown that rather than performing the task, models may exploit shortcuts that are difficult to detect using standard n-gram similarity metrics such as ROUGE. Better automated evaluation metrics are needed, but few resources exist to assess metrics when they are proposed. Therefore, we introduce a dataset of human-assessed summary quality facets and pairwise preferences to encourage and support the development of better automated evaluation methods for literature review MDS. We take advantage of community submissions to the Multi-document Summarization for Literature Review (MSLR) shared task to compile a diverse and representative sample of generated summaries. We analyze how automated summarization evaluation metrics correlate with lexical features of generated summaries, to other automated metrics including several we propose in this work, and to aspects of human-assessed summary quality. We find that not only do automated metrics fail to capture aspects of quality as assessed by humans, in many cases the system rankings produced by these metrics are anti-correlated with rankings according to human annotators.
评估多文档摘要生成(MDS)质量很困难,特别是在生物医学文献综述中的MDS方面,模型必须综合来自不同文档的反对性证据。先前的工作表明,而不是完成任务,模型可能利用难以通过标准大词相似度量(如ROUGE)检测到的捷径。需要更多的自动化评估指标,但在提出指标时缺乏评估资源。因此,我们引入了一个由人类评估摘要质量 facet 和对比偏好组成的数据集,以鼓励和支持发展更好的自动化评估方法,用于文献综述的MDS。利用社区对多文档摘要生成文献综述(MSLR)共享任务提交的报告,收集了多样化且具有代表性的生成摘要样本。我们分析自动化摘要评估指标与生成摘要的词汇特征之间的关系,与其他自动化指标,包括我们在这项工作中提出的几个指标,以及人类评估摘要质量方面的方面。我们发现,不仅自动化指标无法捕捉人类评估质量方面的因素,在许多情况下,这些指标生成的系统排名与人类标注者的排名之间存在反相关关系。
https://arxiv.org/abs/2305.13693
Although supervised learning has been highly successful in improving the state-of-the-art in the domain of image-based computer vision in the past, the margin of improvement has diminished significantly in recent years, indicating that a plateau is in sight. Meanwhile, the use of self-supervised learning (SSL) for the purpose of natural language processing (NLP) has seen tremendous successes during the past couple of years, with this new learning paradigm yielding powerful language models. Inspired by the excellent results obtained in the field of NLP, self-supervised methods that rely on clustering, contrastive learning, distillation, and information-maximization, which all fall under the banner of discriminative SSL, have experienced a swift uptake in the area of computer vision. Shortly afterwards, generative SSL frameworks that are mostly based on masked image modeling, complemented and surpassed the results obtained with discriminative SSL. Consequently, within a span of three years, over $100$ unique general-purpose frameworks for generative and discriminative SSL, with a focus on imaging, were proposed. In this survey, we review a plethora of research efforts conducted on image-oriented SSL, providing a historic view and paying attention to best practices as well as useful software packages. While doing so, we discuss pretext tasks for image-based SSL, as well as techniques that are commonly used in image-based SSL. Lastly, to aid researchers who aim at contributing to image-focused SSL, we outline a number of promising research directions.
尽管监督学习在过去几年中在基于图像的计算机视觉领域中取得了极高的成功,但改进余地已经在近年来大大减少,表明一个平台期即将来临。与此同时,过去几年中,利用自我监督学习(SSL)进行自然语言处理(NLP)取得了巨大的成功,而这种新的学习范式产生了强大的语言模型。受到在NLP领域的出色结果启发,依赖于聚类、对比学习、蒸馏和信息最大化的自我监督方法在计算机视觉领域中迅速获得了认可。这些自我监督方法被称为辨别性SSL,它们的表现已经超过了基于SSL的自我监督方法。不久之后,基于掩码图像建模的生成SSL框架所取代了辨别性SSL的结果,并在计算机视觉领域中迅速普及。因此,在三年的时间内,超过100个通用的生成性和辨别性SSL框架,以图像为主题,被提出了。在这个研究中,我们对基于图像的SSL领域中开展的大量研究进行了回顾,提供了历史的视角,并关注最佳实践和有用的软件包。在研究过程中,我们讨论了基于图像的SSL的前言任务,以及在图像方面的 SSL 中广泛应用的技术。最后,为了帮助致力于图像聚焦的SSL研究的研究人员,我们列举了一些有前途的研究方向。
https://arxiv.org/abs/2305.13689