This paper presents findings from an exploratory needfinding study investigating the research current status and potential participation of the competitions on the robotics community towards four human-centric topics: safety, privacy, explainability, and federated learning. We conducted a survey with 34 participants across three distinguished European robotics consortia, nearly 60% of whom possessed over five years of research experience in robotics. Our qualitative and quantitative analysis revealed that current mainstream robotic researchers prioritize safety and explainability, expressing a greater willingness to invest in further research in these areas. Conversely, our results indicate that privacy and federated learning garner less attention and are perceived to have lower potential. Additionally, the study suggests a lack of enthusiasm within the robotics community for participating in competitions related to these topics. Based on these findings, we recommend targeting other communities, such as the machine learning community, for future competitions related to these four human-centric topics.
本文来源于一个探索性需求发现研究,研究了机器人社区在四个以人为中心的话题:安全性、隐私、可解释性和联邦学习方面的研究现状和潜在参与。我们在三个著名的欧洲机器人合作组织中进行了调查,几乎所有调查者都拥有五年以上在机器人领域的科研经验。我们的定性和定量分析显示,当前主流机器人研究人员优先考虑安全和可解释性,并表达出对这些领域进一步研究的更大意愿。相反,我们的结果表明,隐私和联邦学习关注度较低,被认为具有较低的可能性。此外,研究表明,机器人社区对这些相关主题的竞赛参与度不高。基于这些发现,我们建议将未来的机器人竞赛针对这四个以人为中心的话题瞄准其他社区,比如机器学习社区。
https://arxiv.org/abs/2403.18616
Spiking neural networks (SNN) are a biologically inspired model of neural networks with certain brain-like properties. In the past few decades, this model has received increasing attention in computer science community, owing also to the successful phenomenon of deep learning. In SNN, communication between neurons takes place through the spikes and spike trains. This differentiates these models from the ``standard'' artificial neural networks (ANN) where the frequency of spikes is replaced by real-valued signals. Spiking neural P systems (SNPS) can be considered a branch of SNN based more on the principles of formal automata, with many variants developed within the framework of the membrane computing theory. In this paper, we first briefly compare structure and function, advantages and drawbacks of SNN and SNPS. A key part of the article is a survey of recent results and applications of machine learning and deep learning models of both SNN and SNPS formalisms.
尖峰神经网络(SNN)是一种具有某些类脑性质的神经网络模型。在过去的几十年里,由于深度学习成功现象的影响,SNN在计算机科学领域得到了越来越多的关注。在SNN中,神经元之间的通信是通过尖峰和尖峰列车进行的。这一区别使得这些模型与“标准”人工神经网络(ANN)不同,ANN中的尖峰频率被替换为实数值信号。尖峰神经网络P系统(SNPS)可以被视为基于形式自动机原则的SNN分支,许多变体都是在膜计算理论框架内开发的。在本文中,我们首先简要比较了SNN和SNPS的结构和功能、优势和不足。文章的重点是对SNN和SNPS形式规范的机器学习和深度学习模型的最新结果和应用的调查。
https://arxiv.org/abs/2403.18609
Despite the considerable efforts being made to monitor and regulate user-generated content on social media platforms, the pervasiveness of offensive language, such as hate speech or cyberbullying, in the digital space remains a significant challenge. Given the importance of maintaining a civilized and respectful online environment, there is an urgent and growing need for automatic systems capable of detecting offensive speech in real time. However, developing effective systems for processing languages such as Chinese presents a significant challenge, owing to the language's complex and nuanced nature, which makes it difficult to process automatically. This paper provides a comprehensive overview of offensive language detection in Chinese, examining current benchmarks and approaches and highlighting specific models and tools for addressing the unique challenges of detecting offensive language in this complex language. The primary objective of this survey is to explore the existing techniques and identify potential avenues for further research that can address the cultural and linguistic complexities of Chinese.
尽管在社交媒体平台上对用户生成内容的监测和管理已经取得了很大的努力,但数字空间中不良言论(如仇恨言论或网络欺凌)的普遍性仍然是一个重要的挑战。考虑到维护文明和尊重的在线环境的重要性,需要开发能够实时检测不良言论的自动系统。然而,由于汉语复杂的语言特点,开发有效的汉语处理系统仍然是一个巨大的挑战。本文全面回顾了汉语中的不良言论检测,探讨了当前的基准和解决方案,并强调了用于解决这种复杂语言中检测不良言论独特挑战的特定模型和工具。本次调查的主要目标是探索现有技术和途径,以解决汉语中的文化和社会语言复杂性。
https://arxiv.org/abs/2403.18314
As drone technology advances, using unmanned aerial vehicles for aerial surveys has become the dominant trend in modern low-altitude remote sensing. The surge in aerial video data necessitates accurate prediction for future scenarios and motion states of the interested target, particularly in applications like traffic management and disaster response. Existing video prediction methods focus solely on predicting future scenes (video frames), suffering from the neglect of explicitly modeling target's motion states, which is crucial for aerial video interpretation. To address this issue, we introduce a novel task called Target-Aware Aerial Video Prediction, aiming to simultaneously predict future scenes and motion states of the target. Further, we design a model specifically for this task, named TAFormer, which provides a unified modeling approach for both video and target motion states. Specifically, we introduce Spatiotemporal Attention (STA), which decouples the learning of video dynamics into spatial static attention and temporal dynamic attention, effectively modeling the scene appearance and motion. Additionally, we design an Information Sharing Mechanism (ISM), which elegantly unifies the modeling of video and target motion by facilitating information interaction through two sets of messenger tokens. Moreover, to alleviate the difficulty of distinguishing targets in blurry predictions, we introduce Target-Sensitive Gaussian Loss (TSGL), enhancing the model's sensitivity to both target's position and content. Extensive experiments on UAV123VP and VisDroneVP (derived from single-object tracking datasets) demonstrate the exceptional performance of TAFormer in target-aware video prediction, showcasing its adaptability to the additional requirements of aerial video interpretation for target awareness.
随着无人机技术的进步,使用无人机进行航空测量已成为现代低空遥感的优势趋势。高空视频数据的激增迫使对未来场景和感兴趣目标的动态状态进行准确预测,特别是在交通管理和灾害应对等领域。现有的视频预测方法仅关注预测未来场景(视频帧),忽略了明确建模目标运动状态,这是高空视频解释的关键。为解决这个问题,我们引入了一个名为 Target-Aware Aerial Video Prediction 的新任务,旨在同时预测未来场景和目标的动态状态。此外,我们为这个任务设计了一个名为 TAFormer 的模型,提供了一种统一建模视频和目标运动状态的方法。具体来说,我们引入了 Spatiotemporal Attention(STA),将视频动态学习的空间静态注意力和时间动态注意力解耦,有效建模场景外观和运动。此外,我们设计了一个信息共享机制(ISM),通过促进信息交互来统一建模视频和目标运动。为了减轻在模糊预测中区分目标的努力,我们引入了 Target-Sensitive Gaussian Loss(TSGL),提高了模型对目标位置和内容的敏感度。对于 UAV123VP 和 VisDroneVP(源于单对象跟踪数据集)的实验表明,TAFormer 在目标意识视频预测方面的表现异常出色,展示了它对空中视频解释额外需求的适应能力。
https://arxiv.org/abs/2403.18238
The advent of Large Language Models (LLMs) has brought in a new era of possibilities in the realm of education. This survey paper summarizes the various technologies of LLMs in educational settings from multifaceted perspectives, encompassing student and teacher assistance, adaptive learning, and commercial tools. We systematically review the technological advancements in each perspective, organize related datasets and benchmarks, and identify the risks and challenges associated with deploying LLMs in education. Furthermore, we outline future research opportunities, highlighting the potential promising directions. Our survey aims to provide a comprehensive technological picture for educators, researchers, and policymakers to harness the power of LLMs to revolutionize educational practices and foster a more effective personalized learning environment.
大语言模型的出现为教育领域带来了新的可能性和时代。这份调查论文从多个角度总结了大型语言模型(LLMs)在教育环境中的应用,包括学生和教师辅助、自适应学习和商业工具。我们系统地审查了每个角度的技术进步,组织相关数据和基准,并探讨了在教育中部署LLMs所带来的风险和挑战。此外,我们概述了未来的研究机会,强调了可能的积极方向。我们的调查旨在为教育者、研究人员和政策制定者提供全面的技术图片,以利用LLMs的力量彻底改变教育实践,促进更有效的个性化学习环境。
https://arxiv.org/abs/2403.18105
Deep learning techniques have been explored within the marine litter problem for approximately 20 years but the majority of the research has developed rapidly in the last five years. We provide an in-depth, up to date, summary and analysis of 28 of the most recent and significant contributions of deep learning in marine debris. From cross referencing the research paper results, the YOLO family significantly outperforms all other methods of object detection but there are many respected contributions to this field that have categorically agreed that a comprehensive database of underwater debris is not currently available for machine learning. Using a small dataset curated and labelled by us, we tested YOLOv5 on a binary classification task and found the accuracy was low and the rate of false positives was high; highlighting the importance of a comprehensive database. We conclude this survey with over 40 future research recommendations and open challenges.
深度学习技术在海洋垃圾问题中进行了约20年的探索,但近五年来,大部分研究都发展迅速。我们提供了对28个最近和最重要的海洋垃圾领域深度学习最新贡献的深入、最新和详细的总结和分析。通过引用研究论文的结果,YOLO家族在所有其他物体检测方法中显著优于其他方法,但在这个领域,有很多值得尊重的贡献,他们 categorically 同意目前还没有为机器学习提供一个全面的海洋垃圾数据库。使用我们精心策划和标签的小数据集,我们对YOLOv5在二分类任务上的表现进行了测试,发现准确率较低,假阳性率较高,突出了全面数据库的重要性。我们结束这次调查,提出了超过40个未来的研究建议和开放性挑战。
https://arxiv.org/abs/2403.18067
Egocentric human pose estimation aims to estimate human body poses and develop body representations from a first-person camera perspective. It has gained vast popularity in recent years because of its wide range of applications in sectors like XR-technologies, human-computer interaction, and fitness tracking. However, to the best of our knowledge, there is no systematic literature review based on the proposed solutions regarding egocentric 3D human pose estimation. To that end, the aim of this survey paper is to provide an extensive overview of the current state of egocentric pose estimation research. In this paper, we categorize and discuss the popular datasets and the different pose estimation models, highlighting the strengths and weaknesses of different methods by comparative analysis. This survey can be a valuable resource for both researchers and practitioners in the field, offering insights into key concepts and cutting-edge solutions in egocentric pose estimation, its wide-ranging applications, as well as the open problems with future scope.
自我中心化的人类姿态估计旨在估计人类身体姿态并从第一人称视角开发人体表示。近年来,由于其在XR技术、人机交互和健身跟踪等领域的广泛应用,该技术已经取得了巨大的人气。然而,据我们所知,目前尚无基于所提出的解决方案的系统性文献综述。因此,本文的调查论文旨在提供关于自中心化3D人类姿态估计研究现状的全面概述。在本文中,我们将对流行的数据集进行分类和讨论,并讨论不同姿态估计模型的优缺点,通过比较分析突出不同方法的优缺点。本调查可以成为该领域研究人员和实践者的宝贵资源,为对自中心化姿态估计、其广泛应用以及未来具有开放性的问题提供深入了解。
https://arxiv.org/abs/2403.17893
In addition to the advancements in deepfake generation, corresponding detection technologies need to continuously evolve to regulate the potential misuse of deepfakes, such as for privacy invasion and phishing attacks. This survey comprehensively reviews the latest developments in deepfake generation and detection, summarizing and analyzing the current state of the art in this rapidly evolving field. We first unify task definitions, comprehensively introduce datasets and metrics, and discuss the development of generation and detection technology frameworks. Then, we discuss the development of several related sub-fields and focus on researching four mainstream deepfake fields: popular face swap, face reenactment, talking face generation, and facial attribute editing, as well as foreign detection. Subsequently, we comprehensively benchmark representative methods on popular datasets for each field, fully evaluating the latest and influential works published in top conferences/journals. Finally, we analyze the challenges and future research directions of the discussed fields. We closely follow the latest developments in this https URL.
除了深度伪造技术的进步,相应的检测技术也需要持续进化来规范深度伪造技术的潜在滥用,例如隐私侵犯和网络钓鱼攻击。这项调查全面回顾了深度伪造技术和检测的最新发展,总结和分析了这一快速发展的领域的现状。我们首先统一了任务定义,全面介绍了数据集和指标,并讨论了生成和检测技术框架的发展。接着,我们讨论了几个相关子领域的开发,重点研究了四个主流的深度伪造领域:流行人脸交换、人脸复原、谈话式人脸生成和面部属性编辑以及外国检测。随后,我们针对每个领域全面评估了代表方法,全面评估了顶级会议/期刊上最近发表的作品。最后,我们分析了讨论领域所面临的挑战和未来的研究方向。我们密切关注着这个链接:https://url.com/。
https://arxiv.org/abs/2403.17881
Conventional machine learning algorithms have traditionally been designed under the assumption that input data follows a vector-based format, with an emphasis on vector-centric paradigms. However, as the demand for tasks involving set-based inputs has grown, there has been a paradigm shift in the research community towards addressing these challenges. In recent years, the emergence of neural network architectures such as Deep Sets and Transformers has presented a significant advancement in the treatment of set-based data. These architectures are specifically engineered to naturally accommodate sets as input, enabling more effective representation and processing of set structures. Consequently, there has been a surge of research endeavors dedicated to exploring and harnessing the capabilities of these architectures for various tasks involving the approximation of set functions. This comprehensive survey aims to provide an overview of the diverse problem settings and ongoing research efforts pertaining to neural networks that approximate set functions. By delving into the intricacies of these approaches and elucidating the associated challenges, the survey aims to equip readers with a comprehensive understanding of the field. Through this comprehensive perspective, we hope that researchers can gain valuable insights into the potential applications, inherent limitations, and future directions of set-based neural networks. Indeed, from this survey we gain two insights: i) Deep Sets and its variants can be generalized by differences in the aggregation function, and ii) the behavior of Deep Sets is sensitive to the choice of the aggregation function. From these observations, we show that Deep Sets, one of the well-known permutation-invariant neural networks, can be generalized in the sense of a quasi-arithmetic mean.
传统的机器学习算法通常假定输入数据遵循向量基格式,重点关注向量基范式。然而,随着涉及基于集的输入任务的需求不断增加,研究社区已经发生了范式转变,重点解决这些挑战。近年来,深度设置(如Deep Sets和Transformer)的出现为处理集数据提供了显著的进步。这些架构特意设计成能自然适应输入集,从而实现更有效的集结构表示和处理。因此,针对各种涉及近似集函数的任务,研究兴趣迅速增加。本全面调查旨在为神经网络 approximate set functions的问题提供概述,并探讨相关研究进展。通过深入研究这些方法并阐明相关挑战,调查旨在为读者提供对领域的全面理解。通过这种全面的视角,我们希望研究人员能够深入了解集-基神经网络的潜在应用、固有局限性和未来发展方向。事实上,从本次调查中我们获得了两个洞察:i)深度设置及其变体可以通过聚合函数的差异进行泛化;ii)深度设置的行为对聚合函数的选择敏感。从这些观察结果中,我们证明了深度设置,这是一种著名的不变序列神经网络,在泛化意义上是准算术平均。
https://arxiv.org/abs/2403.17410
Artificial Intelligence (AI) models are now being utilized in all facets of our lives such as healthcare, education and employment. Since they are used in numerous sensitive environments and make decisions that can be life altering, potential biased outcomes are a pressing matter. Developers should ensure that such models don't manifest any unexpected discriminatory practices like partiality for certain genders, ethnicities or disabled people. With the ubiquitous dissemination of AI systems, researchers and practitioners are becoming more aware of unfair models and are bound to mitigate bias in them. Significant research has been conducted in addressing such issues to ensure models don't intentionally or unintentionally perpetuate bias. This survey offers a synopsis of the different ways researchers have promoted fairness in AI systems. We explore the different definitions of fairness existing in the current literature. We create a comprehensive taxonomy by categorizing different types of bias and investigate cases of biased AI in different application domains. A thorough study is conducted of the approaches and techniques employed by researchers to mitigate bias in AI models. Moreover, we also delve into the impact of biased models on user experience and the ethical considerations to contemplate when developing and deploying such models. We hope this survey helps researchers and practitioners understand the intricate details of fairness and bias in AI systems. By sharing this thorough survey, we aim to promote additional discourse in the domain of equitable and responsible AI.
人工智能(AI)模型现在已在我们生活的各个领域得到广泛应用,如医疗、教育和就业。由于这些模型应用于敏感环境中并做出可能改变人生轨迹的决策,潜在的偏见结果成为一个迫切的问题。开发人员应确保这些模型不会表现出任何意外的歧视性行为,如对某些性别、种族或残疾人的偏见。随着AI系统的普遍传播,研究人员和实践者越来越意识到不公正的模型,并承诺减少这些模型中的偏见。为了解决这些问题,已经进行了大量的研究。本调查概述了研究人员如何促进AI系统中的公平性。我们探讨了当前文献中不同形式的公平的定义。我们通过分类不同类型的偏见并研究不同应用领域中的偏见实例,创建了一个全面的分类。对研究人员如何通过减少偏见来优化AI模型进行了深入的研究。此外,我们还深入研究了 biased models 对用户体验的影响以及开发和部署这些模型时需要考虑的伦理问题。我们希望这次调查有助于研究人员和实践者了解AI系统中的公平性和偏见。通过分享这份深入的研究,我们希望促进关于公平和负责任AI领域更多的对话。
https://arxiv.org/abs/2403.17333
As synthetic media becomes progressively more realistic and barriers to using it continue to lower, the technology has been increasingly utilized for malicious purposes, from financial fraud to nonconsensual pornography. Today, the principal defense against being misled by synthetic media relies on the ability of the human observer to visually and auditorily discern between real and fake. However, it remains unclear just how vulnerable people actually are to deceptive synthetic media in the course of their day to day lives. We conducted a perceptual study with 1276 participants to assess how accurate people were at distinguishing synthetic images, audio only, video only, and audiovisual stimuli from authentic. To reflect the circumstances under which people would likely encounter synthetic media in the wild, testing conditions and stimuli emulated a typical online platform, while all synthetic media used in the survey was sourced from publicly accessible generative AI technology. We find that overall, participants struggled to meaningfully discern between synthetic and authentic content. We also find that detection performance worsens when the stimuli contains synthetic content as compared to authentic content, images featuring human faces as compared to non face objects, a single modality as compared to multimodal stimuli, mixed authenticity as compared to being fully synthetic for audiovisual stimuli, and features foreign languages as compared to languages the observer is fluent in. Finally, we also find that prior knowledge of synthetic media does not meaningfully impact their detection performance. Collectively, these results indicate that people are highly susceptible to being tricked by synthetic media in their daily lives and that human perceptual detection capabilities can no longer be relied upon as an effective counterdefense.
随着合成媒体日益逼真,使其使用的障碍不断降低,该技术越来越被用于恶意目的,从金融欺诈到未经同意的色情内容。如今,防止被合成媒体误导的主要方法是,人类观察者能够通过视觉和听觉分辨真实和虚假。然而,人们实际上对虚假合成媒体在日常生活中的抵抗力仍然不确定。我们进行了一项研究,共有1276名参与者参与,以评估人们准确分辨合成图像、音频、视频和音频-视频刺激物的能力。为了反映人们在日常生活中可能会遇到的合成媒体的情况,我们测试了条件和刺激物,模仿了一个典型的在线平台,而调查使用的所有合成媒体都来自公开可用的生成人工智能技术。我们发现,参与者在分辨真实和虚假内容方面总体上存在困难。我们还发现,当刺激物包含合成内容时,检测性能会恶化。与真实内容相比,图像涉及到人类面部的检测性能与非面部物体相比更差。单模态与多模态刺激相比,混合真实与完全合成更为糟糕。与观察者所熟悉的语言相比,语言也是不相关的。最后,我们发现,对合成媒体的了解并不会显著影响他们的检测性能。 collective这些结果表明,人们在生活中对虚假合成媒体非常敏感,而人类感知检测能力已无法被视为有效的防御手段。
https://arxiv.org/abs/2403.16760
Medical image registration is vital for disease diagnosis and treatment with its ability to merge diverse information of images, which may be captured under different times, angles, or modalities. Although several surveys have reviewed the development of medical image registration, these surveys have not systematically summarized methodologies of existing medical image registration methods. To this end, we provide a comprehensive review of these methods from traditional and deep learning-based directions, aiming to help audiences understand the development of medical image registration quickly. In particular, we review recent advances in retinal image registration at the end of each section, which has not attracted much attention. Additionally, we also discuss the current challenges of retinal image registration and provide insights and prospects for future research.
医学图像配准对疾病诊断和治疗至关重要,因为它能够融合图像中多样信息,这些信息可能在不同的时间、角度或模态下被捕捉。尽管已经进行了多次调查来回顾医学图像配准的发展,但这些调查并没有系统地总结现有医学图像配准方法。因此,我们从传统和深度学习方向全面回顾这些方法,旨在帮助观众快速了解医学图像配准的发展。特别是,我们回顾了每节结尾处的视网膜图像配准方面的最近进展,这并没有引起太多关注。此外,我们还讨论了视网膜图像配准目前面临的挑战,并为未来研究提供了见解和展望。
https://arxiv.org/abs/2403.16502
Video generation is a rapidly advancing research area, garnering significant attention due to its broad range of applications. One critical aspect of this field is the generation of long-duration videos, which presents unique challenges and opportunities. This paper presents the first survey of recent advancements in long video generation and summarises them into two key paradigms: divide and conquer temporal autoregressive. We delve into the common models employed in each paradigm, including aspects of network design and conditioning techniques. Furthermore, we offer a comprehensive overview and classification of the datasets and evaluation metrics which are crucial for advancing long video generation research. Concluding with a summary of existing studies, we also discuss the emerging challenges and future directions in this dynamic field. We hope that this survey will serve as an essential reference for researchers and practitioners in the realm of long video generation.
视频生成是一个快速发展的研究领域,因其广泛的应用而受到广泛关注。这个领域的一个关键方面是生成长时视频,这带来了独特的挑战和机遇。本文是对近年来长视频生成研究的第一次调查,并将它们总结为两个关键范式:分而治之的时间自回归。我们深入研究每个范式中使用的常见模型,包括网络设计和调节技术。此外,我们提供了关于对长视频生成研究至关重要的数据集和评估指标的全面概述和分类。结论部分我们对现有研究进行总结,并讨论了该动态领域中新兴的挑战和未来的发展方向。我们希望这次调查能成为深入研究长视频生成的研究人员和实践者的有益参考。
https://arxiv.org/abs/2403.16407
We conducted a survey of 135 software engineering (SE) practitioners to understand how they use Generative AI-based chatbots like ChatGPT for SE tasks. We find that they want to use ChatGPT for SE tasks like software library selection but often worry about the truthfulness of ChatGPT responses. We developed a suite of techniques and a tool called CID (ChatGPT Incorrectness Detector) to automatically test and detect the incorrectness in ChatGPT responses. CID is based on the iterative prompting to ChatGPT by asking it contextually similar but textually divergent questions (using an approach that utilizes metamorphic relationships in texts). The underlying principle in CID is that for a given question, a response that is different from other responses (across multiple incarnations of the question) is likely an incorrect response. In a benchmark study of library selection, we show that CID can detect incorrect responses from ChatGPT with an F1-score of 0.74 - 0.75.
我们对135名软件工程(SE)从业者进行了调查,以了解他们如何使用基于生成式人工智能(GAN)的聊天机器人如ChatGPT来完成SE任务。我们发现,他们希望使用ChatGPT来完成软件库选择等SE任务,但通常担心ChatGPT的回答是否真实。为了自动测试和检测ChatGPT回答中的错误,我们开发了一套技术和一个名为CID(ChatGPT Incorrectness Detector)的工具。CID基于向ChatGPT提出具有上下文相似但文本上不同的提问(使用文本中形态关系的 approach)来逐步提示ChatGPT。CID的原则是,对于给定的问题,与问题其他回答( across multiple question versions)不同的响应很可能是一个错误的回答。在图书馆选择基准研究中,我们证明了CID可以检测到ChatGPT的错误回答,其F1分数为0.74 - 0.75。
https://arxiv.org/abs/2403.16347
For the past few years, the Consumer Internet of Things (CIoT) has entered public lives. While CIoT has improved the convenience of people's daily lives, it has also brought new security and privacy concerns. In this survey, we try to figure out what researchers can learn about the security and privacy of CIoT by traffic analysis, a popular method in the security community. From the security and privacy perspective, this survey seeks out the new characteristics in CIoT traffic analysis, the state-of-the-art progress in CIoT traffic analysis, and the challenges yet to be solved. We collected 310 papers from January 2018 to December 2023 related to CIoT traffic analysis from the security and privacy perspective and summarized the process of CIoT traffic analysis in which the new characteristics of CIoT are identified. Then, we detail existing works based on five application goals: device fingerprinting, user activity inference, malicious traffic analysis, security analysis, and measurement. At last, we discuss the new challenges and future research directions.
近年来,物联网消费者(CIoT)已进入公众生活。虽然CIoT提高了人们日常生活的便利性,但也带来了新的安全和隐私问题。在这次调查中,我们试图通过交通分析,这个在安全社区中非常受欢迎的方法,了解研究人员可以学到关于CIoT的 security and privacy的什么。从安全和隐私的角度来看,这次调查试图找出CIoT交通分析中的新特征,CIoT交通分析的最新进展以及尚未解决的挑战。我们收集了2018年1月至2023年12月期间从安全和隐私角度涉及CIoT交通分析的310篇论文,并概述了在CIoT交通分析中识别新特征的过程。然后,我们详细介绍了基于五个应用目标的工作:设备指纹ing,用户活动推断,恶意 traffic analysis,安全分析和测量。最后,我们讨论了新的挑战和未来的研究方向。
https://arxiv.org/abs/2403.16149
Large Language Models (LLMs) are trained on massive web-crawled corpora. This poses risks of leakage, including personal information, copyrighted texts, and benchmark datasets. Such leakage leads to undermining human trust in AI due to potential unauthorized generation of content or overestimation of performance. We establish the following three criteria concerning the leakage issues: (1) leakage rate: the proportion of leaked data in training data, (2) output rate: the ease of generating leaked data, and (3) detection rate: the detection performance of leaked versus non-leaked data. Despite the leakage rate being the origin of data leakage issues, it is not understood how it affects the output rate and detection rate. In this paper, we conduct an experimental survey to elucidate the relationship between the leakage rate and both the output rate and detection rate for personal information, copyrighted texts, and benchmark data. Additionally, we propose a self-detection approach that uses few-shot learning in which LLMs detect whether instances are present or absent in their training data, in contrast to previous methods that do not employ explicit learning. To explore the ease of generating leaked information, we create a dataset of prompts designed to elicit personal information, copyrighted text, and benchmarks from LLMs. Our experiments reveal that LLMs produce leaked information in most cases despite less such data in their training set. This indicates even small amounts of leaked data can greatly affect outputs. Our self-detection method showed superior performance compared to existing detection methods.
大语言模型(LLMs)通过训练于大规模爬取的语料库进行训练。这可能导致泄露风险,包括个人隐私信息、受版权保护的文本和基准数据。这种泄露导致由于可能未经授权生成内容或高估性能而破坏了人工智能的可信度。我们确定了以下三个关于泄漏问题的事项:(1)泄漏率:训练数据中泄露数据的占比,(2)输出率:生成泄露数据的容易程度,(3)检测率:检测泄漏数据和非泄漏数据的性能。尽管泄漏率是数据泄漏问题的根源,但它并不清楚它如何影响输出率和检测率。在本文中,我们进行了实验调查,阐明了泄漏率与个人隐私信息、受版权保护的文本和基准数据的可输出率和检测率之间的关系。此外,我们提出了一个自检方法,其中LLMs在训练数据中检测实例是否存在或不存在,与之前不使用显式学习的方法不同。为了探索生成泄漏信息的可行性,我们创建了一个旨在引起LLMs生成个人隐私信息、受版权保护的文本和基准数据的提示的数据集。我们的实验发现,大多数情况下,LLMs都会产生泄漏信息,尽管它们的训练集中包含的这类数据较少。这表明,即使是少量的泄漏数据也会对输出产生重大影响。我们的自检方法在现有检测方法中显示出卓越的性能。
https://arxiv.org/abs/2403.16139
In recent years, complementary recommendation has received extensive attention in the e-commerce domain. In this paper, we comprehensively summarize and compare 34 representative studies conducted between 2009 and 2024. Firstly, we compare the data and methods used for modeling complementary relationships between products, including simple complementarity and more complex scenarios such as asymmetric complementarity, the coexistence of substitution and complementarity relationships between products, and varying degrees of complementarity between different pairs of products. Next, we classify and compare the models based on the research problems of complementary recommendation, such as diversity, personalization, and cold-start. Furthermore, we provide a comparative analysis of experimental results from different studies conducted on the same dataset, which helps identify the strengths and weaknesses of the research. Compared to previous surveys, this paper provides a more updated and comprehensive summary of the research, discusses future research directions, and contributes to the advancement of this field.
近年来,在电子商务领域,互补推荐得到了广泛的关注。在本文中,我们全面总结了2009年至2024年间34项研究的结果,并对其进行了比较。首先,我们比较了建模产品互补关系所使用的数据和方法,包括简单的互补关系和较为复杂的情况,如不对称互补关系、产品之间的替代关系和不同产品之间的互补关系程度不同等。接下来,我们根据互补推荐的 research problems(多样性、个性化、冷启动)对模型进行分类并比较,进一步,我们在同一数据集上比较了不同研究的结果,有助于确定研究工作的优势和不足。与之前的调查相比,本文提供了更全面、更新的研究总结,讨论了未来的研究方向,并为这个领域的发展做出了贡献。
https://arxiv.org/abs/2403.16135
This paper explores techniques that focus on understanding and resolving ambiguity in language within the field of natural language processing (NLP), highlighting the complexity of linguistic phenomena such as polysemy and homonymy and their implications for computational models. Focusing extensively on Word Sense Disambiguation (WSD), it outlines diverse approaches ranging from deep learning techniques to leveraging lexical resources and knowledge graphs like WordNet. The paper introduces cutting-edge methodologies like word sense extension (WSE) and neuromyotonic approaches, enhancing disambiguation accuracy by predicting new word senses. It examines specific applications in biomedical disambiguation and language specific optimisation and discusses the significance of cognitive metaphors in discourse analysis. The research identifies persistent challenges in the field, such as the scarcity of sense annotated corpora and the complexity of informal clinical texts. It concludes by suggesting future directions, including using large language models, visual WSD, and multilingual WSD systems, emphasising the ongoing evolution in addressing lexical complexities in NLP. This thinking perspective highlights the advancement in this field to enable computers to understand language more accurately.
本文探讨了自然语言处理(NLP)领域中关注理解和解决语言中歧义的技术,重点关注诸如语义多义性和同义词现象等语言现象的复杂性及其对计算模型的影响。文章对词义消歧(WSD)进行了深入探讨,概述了包括深度学习技术在内的多种方法,以及如何利用词汇资源和相关知识图谱(如WordNet)。本文介绍了具有前沿性的方法,如词义扩展(WSE)和神经肌肉方法,通过预测新词义来提高消歧准确性。文章考察了生物医学消歧和语言特定的优化应用,并讨论了语义隐喻在语篇分析中的重要性。研究指出,该领域面临持续的挑战,例如语义注释语料库的匮乏和 informal clinical 文本的复杂性。最后,文章提出了未来的研究方向,包括使用大型语言模型、视觉 WSD 和多语言 WSD 系统,强调 NLP 中解决词汇复杂性的不断进化。这种思考方式突出了该领域的发展,使计算机更准确地理解语言。
https://arxiv.org/abs/2403.16129
Which fairness metrics are appropriately applicable in your contexts? There may be instances of discordance regarding the perception of fairness, even when the outcomes comply with established fairness metrics. Several surveys have been conducted to evaluate fairness metrics with human perceptions of fairness. However, these surveys were limited in scope, including only a few hundred participants within a single country. In this study, we conduct an international survey to evaluate the appropriateness of various fairness metrics in decision-making scenarios. We collected responses from 1,000 participants in each of China, France, Japan, and the United States, amassing a total of 4,000 responses, to analyze the preferences of fairness metrics. Our survey consists of three distinct scenarios paired with four fairness metrics, and each participant answers their preference for the fairness metric in each case. This investigation explores the relationship between personal attributes and the choice of fairness metrics, uncovering a significant influence of national context on these preferences.
以下公正指标是否适用于您所处的上下文?即使结果符合已建立的公正指标,也可能存在关于公正感的认知分歧。已有几项调查评估了人类对公正感的认知。然而,这些调查的范围有限,仅包括一个国家的几百个参与者。在这项研究中,我们对各种公正指标在决策场景中的适用性进行了国际调查,以分析公正指标的选择偏好。我们的调查包括三个不同的场景与四个公正指标搭配,每个参与者回答每个情况下对公平指标的选择偏好。这项调查揭示了个人特质与公正指标选择之间的密切关系,并揭示了国家背景对这些偏好的重大影响。
https://arxiv.org/abs/2403.16101
The computational treatment of arguments on controversial issues has been subject to extensive NLP research, due to its envisioned impact on opinion formation, decision making, writing education, and the like. A critical task in any such application is the assessment of an argument's quality - but it is also particularly challenging. In this position paper, we start from a brief survey of argument quality research, where we identify the diversity of quality notions and the subjectiveness of their perception as the main hurdles towards substantial progress on argument quality assessment. We argue that the capabilities of instruction-following large language models (LLMs) to leverage knowledge across contexts enable a much more reliable assessment. Rather than just fine-tuning LLMs towards leaderboard chasing on assessment tasks, they need to be instructed systematically with argumentation theories and scenarios as well as with ways to solve argument-related problems. We discuss the real-world opportunities and ethical issues emerging thereby.
翻译:在有争议问题上的论点处理计算方法一直以来都受到自然语言处理(NLP)研究的广泛探讨,因为它的设想对观点形成、决策和写作教育等方面都产生了影响。在这样一个应用中,对论点质量的评估是一个关键的任务,但这也是一个特别具有挑战性的任务。在这篇论文中,我们从对论点质量研究的简要调查开始,我们识别出质量观念的多样性以及它们的可感知性是影响对论点质量评估的重大障碍。我们认为,指令大型语言模型(LLMs)在上下文中共享知识的能力使得对论点质量的评估更加可靠。而不仅仅是将LLMs微调到评估任务的领先位置,它们还需要系统地通过论点相关理论场景以及解决论点相关问题的方法进行指令。我们讨论了由此产生的现实世界机会和道德问题。
https://arxiv.org/abs/2403.16084