The adoption of Deep Neural Networks (DNNs) has greatly benefited Natural Language Processing (NLP) during the past decade. However, the demands of long documents analysis are quite different from those of shorter texts, with the ever increasing size of documents uploaded online rendering NLP on long documents a critical area of research. This paper surveys the current state-of-the-art in the domain, overviewing the relevant neural building blocks and subsequently focusing on two main NLP tasks: Document Classification, Summarization as well as mentioning uses in Sentiment Analysis. We detail the challenges, issues and current solutions related to long-document NLP. We also list publicly available, labelled, long-document datasets used in current research.
在过去的十年中,采用深度神经网络(DNNs)极大地促进了自然语言处理(NLP)的发展。然而,对长文档的分析需求与对短文本的分析需求 quite different,随着在线文档上传内容的日益增加,使得对长文档的NLP分析成为一个重要的研究领域。本文综述了该领域当前的研究进展,概述了相关的神经网络构建块,随后重点探讨了 two main NLP任务:文档分类、摘要提取以及在Sentiment Analysis中的具体应用。本文详细描述了与长文档NLP相关的挑战、问题和当前的解决方案。此外,我们还列出了目前公开可用、标签明确的长文档数据集。
https://arxiv.org/abs/2305.16259
As the deployment of pre-trained language models (PLMs) expands, pressing security concerns have arisen regarding the potential for malicious extraction of training data, posing a threat to data privacy. This study is the first to provide a comprehensive survey of training data extraction from PLMs. Our review covers more than 100 key papers in fields such as natural language processing and security. First, preliminary knowledge is recapped and a taxonomy of various definitions of memorization is presented. The approaches for attack and defense are then systemized. Furthermore, the empirical findings of several quantitative studies are highlighted. Finally, future research directions based on this review are suggested.
随着预训练语言模型(PLM)的部署扩展,有关恶意提取训练数据的潜在安全关切已经浮现,这构成了数据隐私的威胁。这项研究是第一次全面调查PLM从记忆模型中提取训练数据的。我们的综述涵盖了自然语言处理和安全问题领域的超过100篇论文。首先,回顾了先前的知识,并呈现了记忆的各种定义的分类。攻击和防御的方法随后系统化了。此外, several 定量研究的实证结论被重点强调。最后,基于这个综述提出了未来的研究方向。
https://arxiv.org/abs/2305.16157
Microservices is a popular architectural style for the development of distributed software, with an emphasis on modularity, scalability, and flexibility. Indeed, in microservice systems, functionalities are provided by loosely coupled, small services, each focusing on a specific business capability. Building a system according to the microservices architectural style brings a number of challenges, mainly related to how the different microservices are deployed and coordinated and how they interact. In this paper, we provide a survey about how techniques in the area of Artificial Intelligence have been used to tackle these challenges.
Microservices 是一种流行的软件开发架构风格,注重模块化、可扩展性和灵活性。事实上,在微服务系统中,功能是通过松散耦合的小型服务提供的,每个服务专注于特定的业务能力。按照微服务架构风格构建系统会带来一系列挑战,主要与不同微服务如何部署、协调以及如何交互有关。在本文中,我们将综述人工智能技术领域的技术如何被用来解决这些挑战。
https://arxiv.org/abs/2305.16092
As 3rd-person pronoun usage shifts to include novel forms, e.g., neopronouns, we need more research on identity-inclusive NLP. Exclusion is particularly harmful in one of the most popular NLP applications, machine translation (MT). Wrong pronoun translations can discriminate against marginalized groups, e.g., non-binary individuals (Dev et al., 2021). In this ``reality check'', we study how three commercial MT systems translate 3rd-person pronouns. Concretely, we compare the translations of gendered vs. gender-neutral pronouns from English to five other languages (Danish, Farsi, French, German, Italian), and vice versa, from Danish to English. Our error analysis shows that the presence of a gender-neutral pronoun often leads to grammatical and semantic translation errors. Similarly, gender neutrality is often not preserved. By surveying the opinions of affected native speakers from diverse languages, we provide recommendations to address the issue in future MT research.
随着使用第三人称代词的方式转向包括新型形式,例如新代词,我们需要更多的研究来探讨身份包容的NLP研究。排斥在最受欢迎的NLP应用之一,机器翻译(MT)中尤其有害。错误的代词翻译可以歧视边缘化群体,例如非二元性别个体(Dev等人,2021)。在这部“现实检查”中,我们研究三个商业MT系统如何翻译第三人称代词。具体而言,我们比较了从英语到五个其他语言(丹麦、波斯语、法语、德语和意大利语)的性别形容词和性别中性代词的翻译,以及从丹麦到英语的翻译,反之亦然。我们的错误分析表明,存在性别中性代词往往导致语法和语义翻译错误。类似地,性别中立往往不被保留。通过调查来自不同语言受影响的本地母语者的意见和建议,我们提供了解决未来MT研究这个问题的建议。
https://arxiv.org/abs/2305.16051
Natural Language Processing (NLP) models based on Machine Learning (ML) are susceptible to adversarial attacks -- malicious algorithms that imperceptibly modify input text to force models into making incorrect predictions. However, evaluations of these attacks ignore the property of imperceptibility or study it under limited settings. This entails that adversarial perturbations would not pass any human quality gate and do not represent real threats to human-checked NLP systems. To bypass this limitation and enable proper assessment (and later, improvement) of NLP model robustness, we have surveyed 378 human participants about the perceptibility of text adversarial examples produced by state-of-the-art methods. Our results underline that existing text attacks are impractical in real-world scenarios where humans are involved. This contrasts with previous smaller-scale human studies, which reported overly optimistic conclusions regarding attack success. Through our work, we hope to position human perceptibility as a first-class success criterion for text attacks, and provide guidance for research to build effective attack algorithms and, in turn, design appropriate defence mechanisms.
基于机器学习(ML)的自然语言处理模型(NLP模型)容易被dversarial攻击所攻击,这些攻击者通过微小的变化来悄悄地修改输入文本,从而迫使模型做出错误的预测。然而,对这些攻击的评价却忽视了 imperceptibility 的特性,或者只在特定条件下进行了研究。这导致dversarial perturbations 无法通过任何人类质量门(human quality gate)通过,并且并不能对人工检查的NLP系统构成真正的威胁。为了绕过这个限制并正确评估(后来改进)NLP模型的鲁棒性,我们对378名人类参与者调查了最先进的方法所生成的文本dversarial examples 的可感知性。我们的结果显示,在人类参与的真实场景下,现有的文本攻击是不切实际的。这与之前较小的人类研究相比,它们 reporting 过份乐观的攻击成功结论。通过我们的工作,我们希望将人类感知作为文本攻击的成功标准之一,并为全球研究人员提供指导,以构建有效的攻击算法,并相应地设计适当的防御机制。
https://arxiv.org/abs/2305.15587
The great behavioral heterogeneity observed between individuals with the same psychiatric disorder and even within one individual over time complicates both clinical practice and biomedical research. However, modern technologies are an exciting opportunity to improve behavioral characterization. Existing psychiatry methods that are qualitative or unscalable, such as patient surveys or clinical interviews, can now be collected at a greater capacity and analyzed to produce new quantitative measures. Furthermore, recent capabilities for continuous collection of passive sensor streams, such as phone GPS or smartwatch accelerometer, open avenues of novel questioning that were previously entirely unrealistic. Their temporally dense nature enables a cohesive study of real-time neural and behavioral signals. To develop comprehensive neurobiological models of psychiatric disease, it will be critical to first develop strong methods for behavioral quantification. There is huge potential in what can theoretically be captured by current technologies, but this in itself presents a large computational challenge -- one that will necessitate new data processing tools, new machine learning techniques, and ultimately a shift in how interdisciplinary work is conducted. In my thesis, I detail research projects that take different perspectives on digital psychiatry, subsequently tying ideas together with a concluding discussion on the future of the field. I also provide software infrastructure where relevant, with extensive documentation. Major contributions include scientific arguments and proof of concept results for daily free-form audio journals as an underappreciated psychiatry research datatype, as well as novel stability theorems and pilot empirical success for a proposed multi-area recurrent neural network architecture.
观察到个体之间存在相同精神障碍的人,甚至在同一个体中随着时间的推移出现了巨大的行为差异,这对临床实践和生物医学研究都带来了极大的挑战。然而,现代技术是一个令人兴奋的 opportunity,以改进行为特征。现有的精神病学方法,例如患者调查或临床访谈,具有质量或不可量化的特点,现在可以在更大的范围内收集和分析,以产生新的量化指标。此外,最近实现的连续 passive 传感器流收集功能,例如手机 GPS 或智能手表加速度计,开创了以前完全不可能的新型问题调查途径。它们的时间密集性质使可以 cohesive 的研究实时神经和行为信号。要开发精神障碍的全面生物神经模型,必须首先开发强大的行为量化方法。理论上可以捕捉到现有技术的巨大潜力,但这本身也面临着巨大的计算挑战——这迫使采用新的数据处理工具、新的机器学习技术,并最终改变跨学科工作的方式。在我的研究中,详细描述了采用不同的视角研究数字精神病学的研究项目,随后将想法连接起来并讨论该领域的未来。我还提供相关的软件基础设施,并提供广泛的文档。主要贡献包括每日自由形式音频期刊中的科学论据和概念证明结果,作为受低估的精神病学研究数据类型,以及提议的多区域循环神经网络架构的新稳定性定理和初步实验经验。
https://arxiv.org/abs/2305.15385
The article discusses the localization of radiation sources whose number and other relevant parameters are not known in advance. The data collection is ensured by an autonomous mobile robot that performs a survey in a defined region of interest populated with static obstacles. The measurement trajectory is information-driven rather than pre-planned. The localization exploits a regularized particle filter estimating the sources' parameters continuously. The dynamic robot control switches between two modes, one attempting to minimize the Shannon entropy and the other aiming to reduce the variance of expected measurements in unexplored parts of the target area; both of the modes maintain safe clearance from the obstacles. The performance of the algorithms was tested in a simulation study based on real-world data acquired previously from three radiation sources exhibiting various activities. Our approach reduces the time necessary to explore the region and to find the sources by approximately 40 %; at present, however, the method is unable to reliably localize sources that have a relatively low intensity. In this context, additional research has been planned to increase the credibility and robustness of the procedure and to improve the robotic platform autonomy.
这篇文章讨论了那些在事先不知道数量和其他相关参数的情况下难以确定的辐射源的Localization问题。数据收集是由一个自主移动机器人在包含静态障碍物的指定区域进行探测而实现的。测量轨迹是以信息驱动的方式而不是预先计划的。Localization利用 Regularized Particle Filter 估计源参数,并使用动态机器人控制将机器人控制切换到两个模式之一,一个试图最小化 Shannon熵,另一个试图减少目标区域未探索部分的预期测量误差。两个模式都保持从障碍物的安全清除。算法的性能在基于先前从三个辐射源获取的各种活动的实际数据的实验研究中得到了测试。我们的方法将探索区域和找到辐射源所需的时间减少了大约40%;然而,目前,这种方法无法可靠地确定相对较弱的辐射源的Localization问题。在这种情况下,已计划进行额外的研究,以提高程序的可信度和鲁棒性,并提高机器人平台自主化能力。
https://arxiv.org/abs/2305.15240
Reinforcement Learning (RL) is a powerful machine learning paradigm that has been applied in various fields such as robotics, natural language processing and game playing achieving state-of-the-art results. Targeted to solve sequential decision making problems, it is by design able to learn from experience and therefore adapt to changing dynamic environments. These capabilities make it a prime candidate for controlling and optimizing complex processes in industry. The key to fully exploiting this potential is the seamless integration of RL into existing industrial systems. The industrial communication standard Open Platform Communications UnifiedArchitecture (OPC UA) could bridge this gap. However, since RL and OPC UA are from different fields,there is a need for researchers to bridge the gap between the two technologies. This work serves to bridge this gap by providing a brief technical overview of both technologies and carrying out a semi-exhaustive literature review to gain insights on how RL and OPC UA are applied in combination. With this survey, three main research topics have been identified, following the intersection of RL with OPC UA. The results of the literature review show that RL is a promising technology for the control and optimization of industrial processes, but does not yet have the necessary standardized interfaces to be deployed in real-world scenarios with reasonably low effort.
强化学习(RL)是一种强大的机器学习范式,已经应用于各种领域,如机器人、自然语言处理和游戏玩,取得了最先进的结果。其目标是解决Sequential决策问题,因此可以设计从经验中学习,因此适应不断变化的动态环境。这些能力使其成为工业控制和优化复杂过程的首选。要充分利用这种潜力,关键是要将RL无缝融入现有的工业系统。工业通信标准Open Platform Communications Unified Architecture(OPC UA)可以填补这个差距。然而,由于RL和OPC UA来自不同领域,研究人员需要填补这两个技术之间的差距。这项工作旨在填补这个差距,通过提供两个技术的简要技术概述和进行半充分的文献综述来了解如何将RL和 OPC UA结合起来。通过这份调查,三个主要研究主题被识别,随着RL和 OPC UA之间的交集。文献综述的结果表明,RL是工业过程控制和优化的一种有前途的技术,但还不具备必要的标准化接口,以便以合理的较低 effort 在现实世界场景中部署。
https://arxiv.org/abs/2305.15113
Recent explorations with commercial Large Language Models (LLMs) have shown that non-expert users can jailbreak LLMs by simply manipulating the prompts; resulting in degenerate output behavior, privacy and security breaches, offensive outputs, and violations of content regulator policies. Limited formal studies have been carried out to formalize and analyze these attacks and their mitigations. We bridge this gap by proposing a formalism and a taxonomy of known (and possible) jailbreaks. We perform a survey of existing jailbreak methods and their effectiveness on open-source and commercial LLMs (such as GPT 3.5, OPT, BLOOM, and FLAN-T5-xxl). We further propose a limited set of prompt guards and discuss their effectiveness against known attack types.
最近与商业大型语言模型(LLMs)的探险表明,非专家用户可以通过简单地操纵提示来破解LLMs,导致退化的输出行为、隐私和安全漏洞、进攻性输出以及违反内容监管政策。有限的正式研究已经尝试过 formalize 和 analyze 这些攻击及其缓解方法。我们提出了一个形式化和分类方案来解决这些问题,我们对现有的破解方法和它们在开源和商业LLMs(如GPT 3.5、OPT、BLOOM和FLAN-T5-xxl)上的效力进行了调查,并进一步提出了一组提示保护方案,并讨论它们对抗已知攻击类型的有效性。
https://arxiv.org/abs/2305.14965
An important aspect of developing LLMs that interact with humans is to align models' behavior to their users. It is possible to prompt an LLM into behaving as a certain persona, especially a user group or ideological persona the model captured during its pertaining stage. But, how to best align an LLM with a specific user and not a demographic or ideological group remains an open question. Mining public opinion surveys (by Pew Research), we find that the opinions of a user and their demographics and ideologies are not mutual predictors. We use this insight to align LLMs by modeling both user opinions as well as user demographics and ideology, achieving up to 7 points accuracy gains in predicting public opinions from survey questions across a broad set of topics. In addition to the typical approach of prompting LLMs with demographics and ideology, we discover that utilizing the most relevant past opinions from individual users enables the model to predict user opinions more accurately.
开发与人类交互的LLM的重要方面是使其行为与用户对齐。可以通过prompt an LLM使其表现出某种人格特质,尤其是当模型在相关阶段捕获的用户组或意识形态人格特质。但是,如何最好地将LLM与特定的用户而不是 demographic 或意识形态团体对齐仍然是一个开放问题。通过Pew Research的公共意见调查进行数据挖掘,我们发现用户及其 demographic 和意识形态并不是相互预测的因素。利用这一洞察力,我们可以通过建模用户意见及其 demographic 和意识形态,对齐LLM,从广泛的主题上预测公共意见,最多可以提高7点的准确率。除了利用 demographic 和意识形态 prompts 的常见方法外,我们还发现,利用个体用户的最相关过去意见可以使模型更准确地预测用户意见。
https://arxiv.org/abs/2305.14929
Many real-world applications require surfacing extracted snippets to users, whether motivated by assistive tools for literature surveys or document cross-referencing, or needs to mitigate and recover from model generated inaccuracies., Yet, these passages can be difficult to consume when divorced from their original document context. In this work, we explore the limits of LLMs to perform decontextualization of document snippets in user-facing scenarios, focusing on two real-world settings - question answering and citation context previews for scientific documents. We propose a question-answering framework for decontextualization that allows for better handling of user information needs and preferences when determining the scope of rewriting. We present results showing state-of-the-art LLMs under our framework remain competitive with end-to-end approaches. We also explore incorporating user preferences into the system, finding our framework allows for controllability.
许多现实世界的应用需要将提取的片段呈现给用户,无论是从文献调研的辅助工具还是文档交叉引用的动机出发,或者需要减轻和恢复模型产生的不准确之处。然而,将这些片段从原始文档上下文中分离开来可能会使其难以被消费。在这项工作中,我们探索了LLM在用户面对的场景下进行文档片段脱上下文化的极限,重点关注了两个实际场景——科学文档的问题回答和引用上下文预览。我们提出了一种问答框架,用于脱上下文化,以便更好地处理用户的信息和偏好,在确定改写范围时。我们呈现了结果,表明我们框架下的LLM在性能方面仍然与端到端方法竞争。我们还探索了将用户偏好融入系统中,发现我们的框架可以实现控制。
https://arxiv.org/abs/2305.14772
This survey paper provides a comprehensive review of the use of diffusion models in natural language processing (NLP). Diffusion models are a class of mathematical models that aim to capture the diffusion of information or signals across a network or manifold. In NLP, diffusion models have been used in a variety of applications, such as natural language generation, sentiment analysis, topic modeling, and machine translation. This paper discusses the different formulations of diffusion models used in NLP, their strengths and limitations, and their applications. We also perform a thorough comparison between diffusion models and alternative generative models, specifically highlighting the autoregressive (AR) models, while also examining how diverse architectures incorporate the Transformer in conjunction with diffusion models. Compared to AR models, diffusion models have significant advantages for parallel generation, text interpolation, token-level controls such as syntactic structures and semantic contents, and robustness. Exploring further permutations of integrating Transformers into diffusion models would be a valuable pursuit. Also, the development of multimodal diffusion models and large-scale diffusion language models with notable capabilities for few-shot learning would be important directions for the future advance of diffusion models in NLP.
本调查 paper 提供了对自然语言处理(NLP)中扩散模型的全面综述。扩散模型是一类数学模型,旨在捕捉在网络或多平面上的信息或信号的扩散。在 NLP 中,扩散模型被广泛应用于各种应用,例如自然语言生成、情感分析、主题建模和机器翻译。本文讨论了 NLP 中不同扩散模型的写法、优点和限制,以及它们的应用领域。我们还对扩散模型和替代生成模型进行了彻底比较,特别是突出了自回归(AR)模型,同时研究了不同架构如何与扩散模型结合使用Transformer。与 AR 模型相比,扩散模型在并行生成、文本插值、 token-level 控制(如语法结构和语义内容)和稳健性等方面具有显著优势。探索将Transformer 集成到扩散模型中的进一步可能的组合是非常有价值的追求。此外,发展 multimodal 扩散模型和大型扩散语言模型,具有重要能力,以 few-shot 学习为例,将是扩散模型在 NLP 中未来进步的重要方向。
https://arxiv.org/abs/2305.14671
Representing texts as information about entities has long been deemed effective in event reasoning. We propose OpenPI2.0, an improved dataset for tracking entity states in procedural texts. OpenPI2.0 features not only canonicalized entities that facilitate evaluation, but also salience annotations including both manual labels and automatic predictions. Regarding entity salience, we provide a survey on annotation subjectivity, modeling feasibility, and downstream applications in tasks such as question answering and classical planning.
将文本代表实体信息在事件推理中被视为有效的手段已经得到了长期的认可。我们提出了OpenPI2.0,一个改进的 dataset,用于跟踪实体状态在程序文本中。OpenPI2.0不仅包括用于规范化实体的机制,以方便评估,还包括实体重要性注释,包括手动标签和自动预测。在实体重要性方面,我们提供了一份关于注释主观性、建模可行性以及在问答和经典计划等任务中下游应用的研究。
https://arxiv.org/abs/2305.14603
Text-to-image generation has attracted significant interest from researchers and practitioners in recent years due to its widespread and diverse applications across various industries. Despite the progress made in the domain of vision and language research, the existing literature remains relatively limited, particularly with regard to advancements and applications in this field. This paper explores a relevant research track within multimodal applications, including text, vision, audio, and others. In addition to the studies discussed in this paper, we are also committed to continually updating the latest relevant papers, datasets, application projects and corresponding information at this https URL
文本到图像生成在过去几年中吸引了研究人员和实践中的重要兴趣,因为它在各个行业中具有广泛的应用。尽管在视觉和语言研究领域中取得了一些进展,但现有文献仍然相对有限,特别是在这个领域的的进展和应用方面。本文探讨了 multimodal 应用中的相关研究 track,包括文本、视觉、音频和其他类型的内容。除了本文中所讨论的研究外,我们还将致力于不断更新这个 https URL 上的最新相关论文、数据集、应用项目和相关信息。
https://arxiv.org/abs/2305.14598
Deep learning models developed for time-series associated tasks have become more widely researched nowadays. However, due to the unintuitive nature of time-series data, the interpretability problem -- where we understand what is under the hood of these models -- becomes crucial. The advancement of similar studies in computer vision has given rise to many post-hoc methods, which can also shed light on how to explain time-series models. In this paper, we present a wide range of post-hoc interpretation methods for time-series models based on backpropagation, perturbation, and approximation. We also want to bring focus onto inherently interpretable models, a novel category of interpretation where human-understandable information is designed within the models. Furthermore, we introduce some common evaluation metrics used for the explanations, and propose several directions of future researches on the time-series interpretability problem. As a highlight, our work summarizes not only the well-established interpretation methods, but also a handful of fairly recent and under-developed techniques, which we hope to capture their essence and spark future endeavours to innovate and improvise.
近年来,针对时间序列相关任务开发的深度学习模型已经越来越广泛地进行研究。然而,由于时间序列数据的不直觉性质,解释性问题--即我们理解这些模型内部的情况--变得至关重要。计算机视觉方面的类似研究的进展导致了许多后处理方法,这些方法也可以阐明如何解释时间序列模型。在本文中,我们提出了基于梯度下降、扰动和近似的时间序列模型解释方法的广泛数组。我们还将注意力集中在内在可解释模型上,这是一个在模型内部设计可理解信息的新的解释类别。此外,我们介绍了用于解释的常用评估指标,并提出了关于时间序列解释性问题的未来研究多个方向。作为亮点,我们的工作不仅总结了确立的解释方法,还包括一些相对较新且欠发展的技术,我们希望捕获它们的本质并激发未来创新和改进的努力。
https://arxiv.org/abs/2305.14582
Machine learning (ML) systems in natural language processing (NLP) face significant challenges in generalizing to out-of-distribution (OOD) data, where the test distribution differs from the training data distribution. This poses important questions about the robustness of NLP models and their high accuracy, which may be artificially inflated due to their underlying sensitivity to systematic biases. Despite these challenges, there is a lack of comprehensive surveys on the generalization challenge from an OOD perspective in text classification. Therefore, this paper aims to fill this gap by presenting the first comprehensive review of recent progress, methods, and evaluations on this topic. We furth discuss the challenges involved and potential future research directions. By providing quick access to existing work, we hope this survey will encourage future research in this area.
自然语言处理(NLP)中的机器学习(ML)系统在将测试分布与训练数据分布不同的数据进行泛化时面临重大挑战。这提出了对NLP模型稳定性和高精度的重要问题,可能是由于它们的固有对系统偏差敏感性导致的人为夸大。尽管面临着这些挑战,但在文本分类中从OOD角度进行泛化挑战的全面调查缺乏。因此,本文旨在填补这一空缺,并首先提供关于这个话题的最新进展、方法和评估的全面综述。我们最后讨论了所涉及的挑战和未来研究的方向。通过提供快速访问现有工作的渠道,我们希望这 survey 将鼓励该领域的未来研究。
https://arxiv.org/abs/2305.14104
Latest developments in computer hardware, sensor technologies, and artificial intelligence can make virtual reality (VR) and virtual spaces an important part of human everyday life. Eye tracking offers not only a hands-free way of interaction but also the possibility of a deeper understanding of human visual attention and cognitive processes in VR. Despite these possibilities, eye-tracking data also reveal privacy-sensitive attributes of users when it is combined with the information about the presented stimulus. To address these possibilities and potential privacy issues, in this survey, we first cover major works in eye tracking, VR, and privacy areas between the years 2012 and 2022. While eye tracking in the VR part covers the complete pipeline of eye-tracking methodology from pupil detection and gaze estimation to offline use and analyses, as for privacy and security, we focus on eye-based authentication as well as computational methods to preserve the privacy of individuals and their eye-tracking data in VR. Later, taking all into consideration, we draw three main directions for the research community by mainly focusing on privacy challenges. In summary, this survey provides an extensive literature review of the utmost possibilities with eye tracking in VR and the privacy implications of those possibilities.
计算机硬件、传感器技术和人工智能的最新发展可以使得虚拟现实(VR)和虚拟空间成为人类日常生活中的一个重要部分。眼动追踪不仅提供了无触摸交互的方式,而且有可能更深入地理解VR中的人类视觉注意力和认知过程。尽管这些可能性,但眼动追踪数据在与呈现刺激信息结合时也会揭示用户的敏感隐私属性。为了解决这些可能性和潜在的隐私问题,本调查我们首先覆盖了2012年至2022年间眼动追踪、VR和隐私领域的主要工作。在VR部分,眼动追踪涵盖了从瞳孔检测和 gaze 估计到离线使用和分析的完整眼动追踪方法 pipeline。对于隐私和安全,我们重点探讨基于眼的认证以及计算方法,以保护个人及其在VR中的眼动追踪数据的隐私。后来,综合考虑所有因素,我们确定了研究社区的三个主要方向,主要关注隐私挑战。综上所述,本调查提供了在VR中眼动追踪的最大可能性以及这些可能性所涉及的隐私影响的全面文献综述。
https://arxiv.org/abs/2305.14080
Although supervised learning has been highly successful in improving the state-of-the-art in the domain of image-based computer vision in the past, the margin of improvement has diminished significantly in recent years, indicating that a plateau is in sight. Meanwhile, the use of self-supervised learning (SSL) for the purpose of natural language processing (NLP) has seen tremendous successes during the past couple of years, with this new learning paradigm yielding powerful language models. Inspired by the excellent results obtained in the field of NLP, self-supervised methods that rely on clustering, contrastive learning, distillation, and information-maximization, which all fall under the banner of discriminative SSL, have experienced a swift uptake in the area of computer vision. Shortly afterwards, generative SSL frameworks that are mostly based on masked image modeling, complemented and surpassed the results obtained with discriminative SSL. Consequently, within a span of three years, over $100$ unique general-purpose frameworks for generative and discriminative SSL, with a focus on imaging, were proposed. In this survey, we review a plethora of research efforts conducted on image-oriented SSL, providing a historic view and paying attention to best practices as well as useful software packages. While doing so, we discuss pretext tasks for image-based SSL, as well as techniques that are commonly used in image-based SSL. Lastly, to aid researchers who aim at contributing to image-focused SSL, we outline a number of promising research directions.
尽管监督学习在过去几年中在基于图像的计算机视觉领域中取得了极高的成功,但改进余地已经在近年来大大减少,表明一个平台期即将来临。与此同时,过去几年中,利用自我监督学习(SSL)进行自然语言处理(NLP)取得了巨大的成功,而这种新的学习范式产生了强大的语言模型。受到在NLP领域的出色结果启发,依赖于聚类、对比学习、蒸馏和信息最大化的自我监督方法在计算机视觉领域中迅速获得了认可。这些自我监督方法被称为辨别性SSL,它们的表现已经超过了基于SSL的自我监督方法。不久之后,基于掩码图像建模的生成SSL框架所取代了辨别性SSL的结果,并在计算机视觉领域中迅速普及。因此,在三年的时间内,超过100个通用的生成性和辨别性SSL框架,以图像为主题,被提出了。在这个研究中,我们对基于图像的SSL领域中开展的大量研究进行了回顾,提供了历史的视角,并关注最佳实践和有用的软件包。在研究过程中,我们讨论了基于图像的SSL的前言任务,以及在图像方面的 SSL 中广泛应用的技术。最后,为了帮助致力于图像聚焦的SSL研究的研究人员,我们列举了一些有前途的研究方向。
https://arxiv.org/abs/2305.13689
Misinformation, i.e. factually incorrect information, is often conveyed in multiple modalities, e.g. an image accompanied by a caption. It is perceived as more credible by humans, and spreads faster and wider than its text-only counterparts. While an increasing body of research investigates automated fact-checking (AFC), previous surveys mostly focus on textual misinformation. In this survey, we conceptualise a framework for AFC including subtasks unique to multimodal misinformation. Furthermore, we discuss related terminological developed in different communities in the context of our framework. We focus on four modalities prevalent in real-world fact-checking: text, image, audio, and video. We survey benchmarks and models, and discuss limitations and promising directions for future research.
虚假信息(即事实不正确的信息)往往以多种方式传递,例如伴随一张标题图片。人类通常认为图片加标题的信息更可信,其传播速度比仅文本的信息更快更广泛。虽然越来越多的研究在研究自动化事实检查(AFC)方面展开,但以前的调查主要关注文本型虚假信息。在本研究中,我们提出了一个包括 multimodal 虚假信息独特任务的AFC框架。此外,我们在我们的框架上下文中讨论了不同社区中发展出来的相关术语。我们关注现实世界中普遍存在的四个modality:文本、图像、音频和视频。我们调查了基准和模型,并讨论了未来研究的限制和前景。
https://arxiv.org/abs/2305.13507
Neural machine translation (NMT) methods developed for natural language processing have been shown to be highly successful in automating translation from one natural language to another. Recently, these NMT methods have been adapted to the generation of program code. In NMT for code generation, the task is to generate output source code that satisfies constraints expressed in the input. In the literature, a variety of different input scenarios have been explored, including generating code based on natural language description, lower-level representations such as binary or assembly (neural decompilation), partial representations of source code (code completion and repair), and source code in another language (code translation). In this paper we survey the NMT for code generation literature, cataloging the variety of methods that have been explored according to input and output representations, model architectures, optimization techniques used, data sets, and evaluation methods. We discuss the limitations of existing methods and future research directions
神经网络机器翻译(NMT)方法,是为自然语言处理而开发的,已经被证明能够在自动化从一种自然语言到另一种自然语言的转换方面取得成功。最近,这些NMT方法已经适应到了生成程序代码的 generation 任务。在 NMT 代码生成任务中,任务是生成满足输入所表达的限制的输出源代码。在文献中,已经探索了多种不同的输入场景,包括基于自然语言描述的生成代码、低层次的表示,如二进制或汇编(神经破解)、源代码的部分表示(代码 completion 和修复)以及用另一种语言的源代码(代码翻译)。在本文中,我们综述了 NMT 代码生成文献,并对输入和输出表示、模型架构、优化技术、数据集和评估方法等方面所探索的方法进行分类和总结。我们讨论了现有方法的局限性和未来研究的方向。
https://arxiv.org/abs/2305.13504