As we increasingly seek guidance from LLMs for decision-making in daily life, many of these decisions are not clear-cut and depend significantly on the personal values and ethical standards of the users. We present DailyDilemmas, a dataset of 1,360 moral dilemmas encountered in everyday life. Each dilemma includes two possible actions and with each action, the affected parties and human values invoked. Based on these dilemmas, we consolidated a set of human values across everyday topics e.g., interpersonal relationships, workplace, and environmental issues. We evaluated LLMs on these dilemmas to determine what action they will take and the values represented by these actions. Then, we analyzed these values through the lens of five popular theories inspired by sociology, psychology and philosophy. These theories are: World Value Survey, Moral Foundation Theory, Maslow's Hierarchy of Needs, Aristotle's Virtues, and Plutchik Wheel of Emotion. We find that LLMs are most aligned with the self-expression over survival values in terms of World Value Survey, care over loyalty in Moral Foundation Theory. Interestingly, we find large preferences differences in models for some core values such as truthfulness e.g., Mixtral-8x7B model tends to neglect it by 9.7% while GPT-4-turbo model tends to select it by 9.4%. We also study the recent guidance released by OpenAI (ModelSpec), and Anthropic (Constitutional AI) to understand how their released principles reflect their actual value prioritization when facing nuanced moral reasoning in daily-life settings. We find that end users cannot effectively steer such prioritization using system prompts.
随着我们在日常生活中越来越多地寻求LLM在决策中的指导,许多这些决策并不是非黑即白的,并且取决于用户的个人价值观和道德准则。我们提出了DailyDilemmas数据集,这是一个包含1360个在日常生活中的道德困境的数据集。每个困境都包括两种可能的行动,并且每种行动都涉及到受到影响的各方和 invoked的人类价值观。基于这些困境,我们在日常生活中话题上汇总了人类价值观,例如人际关系、工作和环境问题。我们对LLM在这些困境上的行动进行了评估,以确定他们将采取的行动以及这些行动所代表的人类价值观。然后,我们通过社会、心理学和哲学五个影响较大的理论对这些价值观进行分析。这些理论是:世界价值观调查、道德基础理论、马斯洛需求层次理论、亚里士多德美德理论和情感 wheel 理论。我们发现,LLM在关于自我表达生存价值观方面与自我表达和生存价值观最为相似,在道德基础理论方面与关心忠诚方面最为相似。有趣的是,我们在一些核心价值上发现了很大的偏好差异,例如真理fulness,例如Mixtral-8x7B模型往往忽视了它,而GPT-4-turbo模型往往选择了它。我们还研究了OpenAI(ModelSpec)和Anthropic(宪法AI)最近发布的指导,以了解他们在面对复杂道德推理的日常生活中环境中的实际价值优先级。我们发现,用户无法有效地使用系统提示来引导这种优先级。
https://arxiv.org/abs/2410.02683
While recent research increasingly showcases the remarkable capabilities of Large Language Models (LLMs), it's vital to confront their hidden pitfalls. Among these challenges, the issue of memorization stands out, posing significant ethical and legal risks. In this paper, we presents a Systematization of Knowledge (SoK) on the topic of memorization in LLMs. Memorization is the effect that a model tends to store and reproduce phrases or passages from the training data and has been shown to be the fundamental issue to various privacy and security attacks against LLMs. We begin by providing an overview of the literature on the memorization, exploring it across five key dimensions: intentionality, degree, retrievability, abstraction, and transparency. Next, we discuss the metrics and methods used to measure memorization, followed by an analysis of the factors that contribute to memorization phenomenon. We then examine how memorization manifests itself in specific model architectures and explore strategies for mitigating these effects. We conclude our overview by identifying potential research topics for the near future: to develop methods for balancing performance and privacy in LLMs, and the analysis of memorization in specific contexts, including conversational agents, retrieval-augmented generation, multilingual language models, and diffusion language models.
虽然最近的研究越来越展示了大型语言模型(LLMs)的非凡能力,但面对其隐藏的陷阱至关重要。在这些挑战中,记忆问题突出,带来了重大的伦理和法律风险。在本文中,我们关于记忆在LLMs上的系统化知识(SoK)。记忆是模型倾向于存储和复制训练数据中的短语或段落的效应,已经被证明是各种对LLMs进行隐私和安全攻击的根本问题。我们首先对相关文献进行了回顾,探讨了记忆在五个关键维度上的影响:故意性、程度、可检索性、抽象性和透明度。接下来,我们讨论了用于衡量记忆的指标和方法,并分析了导致记忆现象的因素。然后我们研究了记忆在具体模型架构中的表现,并探讨了减轻这些影响的方法。最后,我们在概述中指出了未来可能的研究方向:为LLMs开发平衡性能和隐私的方法,以及分析特定情境(包括对话机器人、检索增强生成、多语言语言模型和扩散语言模型)下的记忆现象。
https://arxiv.org/abs/2410.02650
Knowledge claims are abundant in the literature on large language models (LLMs); but can we say that GPT-4 truly "knows" the Earth is round? To address this question, we review standard definitions of knowledge in epistemology and we formalize interpretations applicable to LLMs. In doing so, we identify inconsistencies and gaps in how current NLP research conceptualizes knowledge with respect to epistemological frameworks. Additionally, we conduct a survey of 100 professional philosophers and computer scientists to compare their preferences in knowledge definitions and their views on whether LLMs can really be said to know. Finally, we suggest evaluation protocols for testing knowledge in accordance to the most relevant definitions.
知识断言在大型语言模型(LLMs)的文獻中很常見;但我们可以说 GPT-4 真正 "知道" 地球是圓的吗?为了解决这个问题,我们审查了关于知识在形而上学中的标准定义,并 formalize 适用于 LLMs 的解释。在这样做的时候,我们识别出当前 NLP 研究在知识与形而上学框架之间存在的不一致性和空白。此外,我们对 100 名专业哲学家兼计算机科学家进行了调查,以比较他们对知识定义的偏好以及他们是否认为 LLMs 真的可以被认为知道。最后,我们提出了测试知识符合最相关定义的评价协议。
https://arxiv.org/abs/2410.02499
Mamba, a special case of the State Space Model, is gaining popularity as an alternative to template-based deep learning approaches in medical image analysis. While transformers are powerful architectures, they have drawbacks, including quadratic computational complexity and an inability to address long-range dependencies efficiently. This limitation affects the analysis of large and complex datasets in medical imaging, where there are many spatial and temporal relationships. In contrast, Mamba offers benefits that make it well-suited for medical image analysis. It has linear time complexity, which is a significant improvement over transformers. Mamba processes longer sequences without attention mechanisms, enabling faster inference and requiring less memory. Mamba also demonstrates strong performance in merging multimodal data, improving diagnosis accuracy and patient outcomes. The organization of this paper allows readers to appreciate the capabilities of Mamba in medical imaging step by step. We begin by defining core concepts of SSMs and models, including S4, S5, and S6, followed by an exploration of Mamba architectures such as pure Mamba, U-Net variants, and hybrid models with convolutional neural networks, transformers, and Graph Neural Networks. We also cover Mamba optimizations, techniques and adaptations, scanning, datasets, applications, experimental results, and conclude with its challenges and future directions in medical imaging. This review aims to demonstrate the transformative potential of Mamba in overcoming existing barriers within medical imaging while paving the way for innovative advancements in the field. A comprehensive list of Mamba architectures applied in the medical field, reviewed in this work, is available at Github.
Mamba,一种 State Space Model 的特殊情况,正在成为医学图像分析中模板为基础的深度学习方法的替代品。尽管 Transformer 是一种强大的架构,但它们存在一些局限性,包括二次计算复杂性和无法有效地解决长距离依赖问题。这种局限性影响到医疗影像大数据的分析,其中存在许多空间和时间关系。相比之下,Mamba 提供了在医学图像分析中具有优势的益处。它具有线性时间复杂性,这是 Transformer 的重大改进。Mamba 在没有注意力机制的情况下处理较长的序列,实现更快的推理并需要更少的内存。Mamba 还展示了在合并多模态数据方面的强大性能,提高诊断准确性和患者 outcomes。本文的组织使读者能够逐步了解 Mamba 在医学影像分析中的能力。我们首先定义了 State Space Model 和模型的核心概念,包括 S4、S5 和 S6,接着探讨了 Mamba 的架构,如纯 Mamba、U-Net 变体和具有卷积神经网络、Transformer 和 Graph Neural Networks 的混合模型。我们还涵盖了 Mamba 的优化、技术和适应性,扫描、数据集、应用、实验结果,并最后结论与挑战及未来在医学影像领域的发展趋势。本综述旨在展示 Mamba 在克服现有医疗影像工作中的障碍的同时,为该领域推动创新进展奠定基础。本工作中回顾了在医学领域应用的 Mamba 架构的完整列表,可在 Github 上查看。
https://arxiv.org/abs/2410.02362
The increasing demand for transparent and reliable models, particularly in high-stakes decision-making areas such as medical image analysis, has led to the emergence of eXplainable Artificial Intelligence (XAI). Post-hoc XAI techniques, which aim to explain black-box models after training, have been controversial in recent works concerning their fidelity to the models' predictions. In contrast, Self-eXplainable AI (S-XAI) offers a compelling alternative by incorporating explainability directly into the training process of deep learning models. This approach allows models to generate inherent explanations that are closely aligned with their internal decision-making processes. Such enhanced transparency significantly supports the trustworthiness, robustness, and accountability of AI systems in real-world medical applications. To facilitate the development of S-XAI methods for medical image analysis, this survey presents an comprehensive review across various image modalities and clinical applications. It covers more than 200 papers from three key perspectives: 1) input explainability through the integration of explainable feature engineering and knowledge graph, 2) model explainability via attention-based learning, concept-based learning, and prototype-based learning, and 3) output explainability by providing counterfactual explanation and textual explanation. Additionally, this paper outlines the desired characteristics of explainability and existing evaluation methods for assessing explanation quality. Finally, it discusses the major challenges and future research directions in developing S-XAI for medical image analysis.
随着对透明和可靠模型的不断需求,尤其是在高风险决策领域,如医学图像分析,出现了可解释人工智能(XAI)。后验XAI技术,其旨在解释训练后模型的黑盒,近年来在评估其对模型预测的可靠性方面引起了争议。相比之下,自解释人工智能(S-XAI)通过将可解释性直接融入深度学习模型的训练过程,提供了一种引人注目的解决方案。这种方法使得模型能够生成与其内部决策过程密切相关的固有解释。这种增强的可解释性显著支持了人工智能系统在现实医学应用中的可信度、稳健性和问责制。为了促进医疗图像分析中S-XAI方法的发展,这项调查对各种图像模式和临床应用进行了全面的回顾。它涵盖了三个关键观点:1)通过将可解释性特征工程和知识图谱集成到输入中进行解释性,2)通过关注式学习、概念学习和原型学习实现模型的可解释性,3)通过提供反事实解释和文本解释实现输出可解释性。此外,本文还概述了可解释性和现有评估方法评估解释质量的期望特征。最后,本文讨论了开发S-XAI用于医学图像分析的主要挑战和未来研究方向。
https://arxiv.org/abs/2410.02331
This paper reviews published research in the field of computer-aided colorization technology. We argue that the colorization task originates from computer graphics, prospers by introducing computer vision, and tends to the fusion of vision and graphics, so we put forward our taxonomy and organize the whole paper chronologically. We extend the existing reconstruction-based colorization evaluation techniques, considering that aesthetic assessment of colored images should be introduced to ensure that colorization satisfies human visual-related requirements and emotions more closely. We perform the colorization aesthetic assessment on seven representative unconditional colorization models and discuss the difference between our assessment and the existing reconstruction-based metrics. Finally, this paper identifies unresolved issues and proposes fruitful areas for future research and development. Access to the project associated with this survey can be obtained at this https URL.
本文回顾了计算机辅助色彩技术领域的已发表研究。我们认为,色彩化任务起源于计算机图形学,通过引入计算机视觉而得到发展,并且倾向于将视觉和图形融合。因此,我们提出了我们的分类体系并按时间顺序组织整篇文章。我们扩展了现有的基于重构的颜色化评估技术,考虑到色彩化应满足人类视觉相关需求和情感,从而使色彩化更加接近人类视觉体验。我们对七个具有代表性的无条件色彩化模型进行了色彩化美学评估,并讨论了我们的评估与现有基于重构的指标之间的差异。最后,本文指出了未解决的问题,并为未来的研究和开发提出了有前景的领域。与本调查相关的项目可以通过此链接获取:https://www.academia.edu/39511842/Unresolved_Issues_and_Future_Research_Development_in_Computer_Aided_Colorization_ Technology.
https://arxiv.org/abs/2410.02288
The widespread adoption of smartphones and Location-Based Social Networks has led to a massive influx of spatio-temporal data, creating unparalleled opportunities for enhancing Point-of-Interest (POI) recommendation systems. These advanced POI systems are crucial for enriching user experiences, enabling personalized interactions, and optimizing decision-making processes in the digital landscape. However, existing surveys tend to focus on traditional approaches and few of them delve into cutting-edge developments, emerging architectures, as well as security considerations in POI recommendations. To address this gap, our survey stands out by offering a comprehensive, up-to-date review of POI recommendation systems, covering advancements in models, architectures, and security aspects. We systematically examine the transition from traditional models to advanced techniques such as large language models. Additionally, we explore the architectural evolution from centralized to decentralized and federated learning systems, highlighting the improvements in scalability and privacy. Furthermore, we address the increasing importance of security, examining potential vulnerabilities and privacy-preserving approaches. Our taxonomy provides a structured overview of the current state of POI recommendation, while we also identify promising directions for future research in this rapidly advancing field.
智能手机和基于位置的社会网络的广泛采用导致了大量的空间-时间数据的大幅涌入,为增强点 of interest(POI)推荐系统提供了无与伦比的机会。这些先进的POI系统对于丰富用户体验、实现个性化互动和优化数字环境中的决策过程至关重要。然而,现有的调查往往关注传统方法,很少有调查深入探讨前沿发展、新兴架构以及POI推荐中的安全问题。为了填补这一空白,我们的调查在POI推荐系统的全面、最新的回顾中脱颖而出,涵盖了模型、架构和安全方面的进步。我们系统地研究了从传统方法到先进技术的转变,例如大型语言模型。此外,我们探讨了从集中式到去中心化和联邦学习的架构进化,突出了可扩展性和隐私的改进。此外,我们还关注了日益重要的安全性,探讨了潜在的漏洞和隐私保护方法。我们的分类系统为POI推荐当前状态提供了结构化的概述,同时我们还在这个快速发展的领域中发现了有前景的研究方向。
https://arxiv.org/abs/2410.02191
Alert fatigue is a common issue faced by software teams using the DevSecOps paradigm. The overwhelming number of warnings and alerts generated by security and code scanning tools, particularly in smaller teams where resources are limited, leads to desensitization and diminished responsiveness to security warnings, potentially exposing systems to vulnerabilities. This paper explores the potential of LLMs in generating actionable security reports that emphasize the financial impact and consequences of detected security issues, such as credential leaks, if they remain unaddressed. A survey conducted among developers indicates that LLM-generated reports significantly enhance the likelihood of immediate action on security issues by providing clear, comprehensive, and motivating insights. Integrating these reports into DevSecOps workflows can mitigate attention saturation and alert fatigue, ensuring that critical security warnings are addressed effectively.
警示疲劳是使用DevSecOps范式软件团队普遍面临的问题。特别是对于资源有限的小型团队,安全性和代码扫描工具生成的警告和警报数量过多,导致对安全警告的麻木和反应减弱,可能使系统暴露于漏洞。本文探讨了LLM在生成关注度高的安全报告以强调检测到的安全问题的财务影响和后果方面的潜力。对开发人员的调查表明,LLM生成的报告显著增强了立即采取行动的可能性,通过提供清晰、全面和有激励性的见解。将这些报告纳入DevSecOps工作流程中可以减轻注意力的饱和,确保对关键安全警告的有效处理。
https://arxiv.org/abs/2410.01899
Immunohistochemical (IHC) stains play a vital role in a pathologist's analysis of medical images, providing crucial diagnostic information for various diseases. Virtual staining from hematoxylin and eosin (H&E)-stained whole slide images (WSIs) allows the automatic production of other useful IHC stains without the expensive physical staining process. However, current virtual WSI generation methods based on tile-wise processing often suffer from inconsistencies in content, texture, and color at tile boundaries. These inconsistencies lead to artifacts that compromise image quality and potentially hinder accurate clinical assessment and diagnoses. To address this limitation, we propose a novel consistent WSI synthesis network, CC-WSI-Net, that extends GAN models to produce seamless synthetic whole slide images. Our CC-WSI-Net integrates a content- and color-consistency supervisor, ensuring consistency across tiles and facilitating the generation of seamless synthetic WSIs while ensuring Sox10 immunohistochemistry accuracy in melanocyte detection. We validate our method through extensive image-quality analyses, objective detection assessments, and a subjective survey with pathologists. By generating high-quality synthetic WSIs, our method opens doors for advanced virtual staining techniques with broader applications in research and clinical care.
免疫组化(IHC)染色在病理学家分析医学影像中起着关键作用,为各种疾病提供重要诊断信息。从HE染色 whole slide images(WSIs)中进行虚拟染色允许在没有昂贵的物理染色过程的情况下自动生成其他有用的IHC染色。然而,基于块处理的方法生成的虚拟WSI通常在内容、纹理和颜色在块边界处存在不稳定性。这些不稳定性导致伪影,可能影响准确临床评估和诊断。为了克服这一局限,我们提出了一种新颖的CC-WSI合成网络,将GAN模型扩展以产生无缝的合成整张图片。我们的CC-WSI-Net集成了一个内容和服务器,确保跨块的一致性,并在保证Sox10免疫组化精度的 melanocyte检测的同时,促进无缝合成WSIs。我们对我们的方法通过大量的图像质量分析、客观检测评估和病理学家主观调查进行了验证。通过生成高质量的合成WSIs,我们的方法为在研究和临床护理中应用更广泛的虚拟染色技术打开了大门。
https://arxiv.org/abs/2410.01072
Since the onset of LLMs, translating natural language queries to structured SQL commands is assuming increasing. Unlike the previous reviews, this survey provides a comprehensive study of the evolution of LLM-based text-to-SQL systems, from early rule-based models to advanced LLM approaches, and how LLMs impacted this field. We discuss benchmarks, evaluation methods and evaluation metrics. Also, we uniquely study the role of integration of knowledge graphs for better contextual accuracy and schema linking in these systems. The current techniques fall into two categories: in-context learning of corpus and fine-tuning, which then leads to approaches such as zero-shot, few-shot learning from the end, and data augmentation. Finally, we highlight key challenges such as computational efficiency, model robustness, and data privacy with perspectives toward their development and improvements in potential areas for future of LLM-based text-to-SQL system.
自LLMs出现以来,将自然语言查询翻译为结构化SQL命令的趋势在逐渐增加。与之前的综述不同,这项调查对基于LLM的文本到SQL系统的演变进行了全面研究,从早期的基于规则的模型到高级LLM方法,以及LLM如何影响这个领域。我们讨论了基准测试、评估方法和评估指标。此外,我们独特地研究了知识图的集成在这些系统中的作用,以提高上下文准确性和模式链接。目前的技术可以分为两类:上下文学习和微调,从而导致诸如零 shot、少 shot学习以及数据增强等方法。最后,我们重点关注了计算效率、模型鲁棒性以及数据隐私等关键挑战,并从LLM-基于文本到SQL系统的未来发展趋势的角度对其进行改进和发展。
https://arxiv.org/abs/2410.01066
The rise of foundation models (FMs), coupled with regulatory efforts addressing their risks and impacts, has sparked significant interest in open-source models. However, existing speech FMs (SFMs) fall short of full compliance with the open-source principles, even if claimed otherwise, as no existing SFM has model weights, code, and training data publicly available under open-source terms. In this work, we take the first step toward filling this gap by focusing on the 24 official languages of the European Union (EU). We collect suitable training data by surveying automatic speech recognition datasets and unlabeled speech corpora under open-source compliant licenses, for a total of 950k hours. Additionally, we release automatic transcripts for 441k hours of unlabeled data under the permissive CC-BY license, thereby facilitating the creation of open-source SFMs for the EU languages.
基础模型(FMs)的崛起,以及针对其风险和影响的监管努力,引发了对于开源模型的浓厚兴趣。然而,现有的语音FM(SFM)即使在宣称符合开源原则的情况下,也未能完全符合,因为目前没有开源许可证下公开可用的模型权重、代码和训练数据。在这项工作中,我们迈出了填补这一空白的第一步,专注于欧盟(EU)的24个官方语言。我们通过调查开源许可证下的自动语音识别数据集和开源语音语料库,收集了总计950k小时的合适训练数据。此外,我们针对441k小时的未标注数据发布了宽松的CC-BY许可证的自动转录,从而为欧盟语言创建了开源SFM提供了便利。
https://arxiv.org/abs/2410.01036
Diffusion models have become increasingly popular for generative modeling due to their ability to generate high-quality samples. This has unlocked exciting new possibilities for solving inverse problems, especially in image restoration and reconstruction, by treating diffusion models as unsupervised priors. This survey provides a comprehensive overview of methods that utilize pre-trained diffusion models to solve inverse problems without requiring further training. We introduce taxonomies to categorize these methods based on both the problems they address and the techniques they employ. We analyze the connections between different approaches, offering insights into their practical implementation and highlighting important considerations. We further discuss specific challenges and potential solutions associated with using latent diffusion models for inverse problems. This work aims to be a valuable resource for those interested in learning about the intersection of diffusion models and inverse problems.
扩散模型因为其能够生成高质量的样本而变得越来越受欢迎,这为解决反问题提供了令人兴奋的新方法。特别是在图像修复和重建方面,将扩散模型视为无监督 prior 处理, unlocked 了新的解决反问题的可能性。这项调查提供了利用预训练扩散模型解决反问题的全面概述,无需进一步训练。我们引入了分类来根据它们所解决的问题和采用的技术对这些方法进行分类。我们分析了不同方法之间的联系,提供了关于其实际实现的重要见解,并进一步讨论了与使用潜在扩散模型解决反问题相关的具体挑战和潜在解决方案。 这项工作旨在成为那些对扩散模型和反问题有兴趣的人的宝贵资源。
https://arxiv.org/abs/2410.00083
Autonomous vehicles (AVs) rely heavily on LiDAR (Light Detection and Ranging) systems for accurate perception and navigation, providing high-resolution 3D environmental data that is crucial for object detection and classification. However, LiDAR systems are vulnerable to adversarial attacks, which pose significant challenges to the safety and robustness of AVs. This survey presents a thorough review of the current research landscape on physical adversarial attacks targeting LiDAR-based perception systems, covering both single-modality and multi-modality contexts. We categorize and analyze various attack types, including spoofing and physical adversarial object attacks, detailing their methodologies, impacts, and potential real-world implications. Through detailed case studies and analyses, we identify critical challenges and highlight gaps in existing attacks for LiDAR-based systems. Additionally, we propose future research directions to enhance the security and resilience of these systems, ultimately contributing to the safer deployment of autonomous vehicles.
自动驾驶车辆(AVs)对激光雷达(LDAR)系统依赖性很大,用于准确感知和导航,提供高分辨率的三维环境数据,这对目标检测和分类至关重要。然而,LDAR系统很容易受到对抗性攻击,这给AV的安全和鲁棒性带来了重大挑战。这项调查对针对基于LDAR感知系统的当前研究格局进行了全面的回顾,涵盖了单模态和多模态情境。我们分类并分析了各种攻击类型,包括伪造和物理对抗性对象攻击,详细描述了它们的攻击方法、影响和潜在的现实世界影响。通过详细的案例研究和分析,我们找出了关键挑战,突出了现有攻击中LDAR系统存在的空白。此外,我们提出了未来研究的方向,以增强这些系统的安全性和鲁棒性,最终为自动驾驶车辆的更安全部署做出贡献。
https://arxiv.org/abs/2409.20426
Common narratives about automation often pit new technologies against workers. The introduction of advanced machine tools, industrial robots, and AI have all been met with concern that technological progress will mean fewer jobs. However, workers themselves offer a more optimistic, nuanced perspective. Drawing on a far-reaching 2024 survey of more than 9,000 workers across nine countries, this paper finds that more workers report potential benefits from new technologies like robots and AI for their safety and comfort at work, their pay, and their autonomy on the job than report potential costs. Workers with jobs that ask them to solve complex problems, workers who feel valued by their employers, and workers who are motivated to move up in their careers are all more likely to see new technologies as beneficial. In contrast to assumptions in previous research, more formal education is in some cases associated with more negative attitudes toward automation and its impact on work. In an experimental setting, the prospect of financial incentives for workers improve their perceptions of automation technologies, whereas the prospect of increased input about how new technologies are used does not have a significant effect on workers' attitudes toward automation.
关于自动化的大众叙事经常将新技术与工人对抗。引进先进的机床、工业机器人以及人工智能技术,都担心技术进步意味着就业机会减少。然而,工人自己提供了一个更乐观、微妙的观点。在9个国家的超过9000名工人进行的长达2024年的调查中,该论文发现,报告从机器人新技术和AI中获得的安全和舒适在工作中,以及工资和自主性的工人比报告潜在成本的工人更多。要求工人解决复杂问题、被雇主认可的工人以及渴望在工作中晋升的工人,更可能将新技术视为有益。与前研究的假设相反,有时较高的教育程度与更消极的对自动化及其对工作的影响的态度有关。在一个实验性的环境中,为工人提供财务激励以改善他们对自动化技术认知的期望,而提供关于新技术如何使用的增加信息,并没有显著影响工人对自动化的态度。
https://arxiv.org/abs/2409.20387
ChatGPT and other state-of-the-art large language models (LLMs) are rapidly transforming multiple fields, offering powerful tools for a wide range of applications. These models, commonly trained on vast datasets, exhibit human-like text generation capabilities, making them useful for research tasks such as ideation, literature review, coding, drafting, and outreach. We conducted a study involving 13 astronomers at different career stages and research fields to explore LLM applications across diverse tasks over several months and to evaluate their performance in research-related activities. This work was accompanied by an anonymous survey assessing participants' experiences and attitudes towards LLMs. We provide a detailed analysis of the tasks attempted and the survey answers, along with specific output examples. Our findings highlight both the potential and limitations of LLMs in supporting research while also addressing general and research-specific ethical considerations. We conclude with a series of recommendations, emphasizing the need for researchers to complement LLMs with critical thinking and domain expertise, ensuring these tools serve as aids rather than substitutes for rigorous scientific inquiry.
ChatGPT和其他大型语言模型(LLMs)正在迅速改变多个领域,为各种应用提供了强大的工具。这些模型通常在庞大的数据集上进行训练,具有类似于人类的文本生成能力,因此它们在研究任务(如创意激发、文献综述、编码、起草和外联)中具有很大的价值。 我们进行了一项研究,涉及13位不同职业阶段和研究领域的天文学家,在数月的时间里探讨了LLM在各种任务上的应用,以评估它们在研究活动中的表现。这项工作还有匿名调查,评估参与者的LLM经验和态度。我们提供了任务尝试分析和调查答案的具体输出示例。 我们的研究结果突出了LLM在支持研究方面的潜力和限制,并探讨了通用和研究的道德考虑。我们得出结论,建议研究人员在LLM中增加批判性思维和专业知识,确保这些工具成为严谨科学研究的辅助,而不是替代品。
https://arxiv.org/abs/2409.20252
The advancement of Large Language Models (LLMs) has significantly impacted various domains, including Web search, healthcare, and software development. However, as these models scale, they become more vulnerable to cybersecurity risks, particularly backdoor attacks. By exploiting the potent memorization capacity of LLMs, adversaries can easily inject backdoors into LLMs by manipulating a small portion of training data, leading to malicious behaviors in downstream applications whenever the hidden backdoor is activated by the pre-defined triggers. Moreover, emerging learning paradigms like instruction tuning and reinforcement learning from human feedback (RLHF) exacerbate these risks as they rely heavily on crowdsourced data and human feedback, which are not fully controlled. In this paper, we present a comprehensive survey of emerging backdoor threats to LLMs that appear during LLM development or inference, and cover recent advancement in both defense and detection strategies for mitigating backdoor threats to LLMs. We also outline key challenges in addressing these threats, highlighting areas for future research.
大规模语言模型的进步对包括搜索引擎、医疗保健和软件开发在内的各种领域产生了重大影响。然而,随着这些模型规模的增长,它们变得越来越容易受到网络安全风险的影响,特别是后门攻击。通过利用LLMs的强大的记忆能力,攻击者可以通过操纵训练数据的一小部分来轻松地注入后门到LLMs中,从而在预定义触发器激活时导致下游应用程序出现恶意行为。此外,新兴的学习范式如指令调整和以人类反馈为基础的强化学习(RLHF)加剧了这些风险,因为它们严重依赖人群数据和人类反馈,而这些数据和反馈并未完全受控。在本文中,我们对LLM在开发或推理过程中出现的新兴后门威胁进行全面调查,并涵盖了为减轻LLM后门威胁防御和检测策略的最近进展。我们还指出了应对这些威胁的关键挑战,并强调了未来研究的方向。
https://arxiv.org/abs/2409.19993
Natural language and visualization are two complementary modalities of human communication that play a crucial role in conveying information effectively. While visualizations help people discover trends, patterns, and anomalies in data, natural language descriptions help explain these insights. Thus, combining text with visualizations is a prevalent technique for effectively delivering the core message of the data. Given the rise of natural language generation (NLG), there is a growing interest in automatically creating natural language descriptions for visualizations, which can be used as chart captions, answering questions about charts, or telling data-driven stories. In this survey, we systematically review the state of the art on NLG for visualizations and introduce a taxonomy of the problem. The NLG tasks fall within the domain of Natural Language Interfaces (NLI) for visualization, an area that has garnered significant attention from both the research community and industry. To narrow down the scope of the survey, we primarily concentrate on the research works that focus on text generation for visualizations. To characterize the NLG problem and the design space of proposed solutions, we pose five Wh-questions, why and how NLG tasks are performed for visualizations, what the task inputs and outputs are, as well as where and when the generated texts are integrated with visualizations. We categorize the solutions used in the surveyed papers based on these "five Wh-questions." Finally, we discuss the key challenges and potential avenues for future research in this domain.
自然语言和可视化是人类交流的两种互补维度,在传达信息方面起着关键作用。虽然可视化帮助人们发现数据中的趋势、模式和异常,自然语言描述则有助于解释这些洞见。因此,将文本与可视化相结合是一种普遍的技巧,可以有效地传递数据的核心信息。随着自然语言生成(NLG)的兴起,人们对自动为可视化创建自然语言描述产生了浓厚兴趣,这些描述可以用于图表标题、回答关于图表的问题或讲述数据驱动的故事。在本次调查中,我们系统地回顾了NLG在可视化领域的前沿研究,并引入了一个分类来解决这个问题。NLG任务属于自然语言交互(NLI)可视化域,该领域从研究社区和产业界都受到了很大的关注。为了缩小调查范围,我们主要关注关注研究重点在文本生成上的可视化作品。通过提出五个“为什么以及如何”的问题,我们试图阐明NLG任务如何为可视化执行,任务输入和输出是什么,以及生成的文本如何与可视化结合。我们根据这些“五个为什么”对调查论文中使用的解决方案进行了分类。最后,我们讨论了该领域未来研究的关键挑战和潜在方向。
https://arxiv.org/abs/2409.19747
Remaining Useful Life (RUL) prediction is a critical aspect of Prognostics and Health Management (PHM), aimed at predicting the future state of a system to enable timely maintenance and prevent unexpected failures. While existing deep learning methods have shown promise, they often struggle to fully leverage the spatial information inherent in complex systems, limiting their effectiveness in RUL prediction. To address this challenge, recent research has explored the use of Graph Neural Networks (GNNs) to model spatial information for more accurate RUL prediction. This paper presents a comprehensive review of GNN techniques applied to RUL prediction, summarizing existing methods and offering guidance for future research. We first propose a novel taxonomy based on the stages of adapting GNNs to RUL prediction, systematically categorizing approaches into four key stages: graph construction, graph modeling, graph information processing, and graph readout. By organizing the field in this way, we highlight the unique challenges and considerations at each stage of the GNN pipeline. Additionally, we conduct a thorough evaluation of various state-of-the-art (SOTA) GNN methods, ensuring consistent experimental settings for fair comparisons. This rigorous analysis yields valuable insights into the strengths and weaknesses of different approaches, serving as an experimental guide for researchers and practitioners working in this area. Finally, we identify and discuss several promising research directions that could further advance the field, emphasizing the potential for GNNs to revolutionize RUL prediction and enhance the effectiveness of PHM strategies. The benchmarking codes are available in GitHub: this https URL\_RUL\_Benchmarking.
剩余使用寿命(RUL)预测是预测系统未来状态的 crucial 方面,旨在及时维护并预防意外故障。虽然现有的深度学习方法已经表现出良好的前景,但它们通常难以充分利用复杂系统固有的空间信息,限制了其在RUL预测方面的有效性。为解决这一挑战,近年来研究探索了使用图神经网络(GNNs)来建模复杂系统空间信息以更准确地进行RUL预测。本文对应用于RUL预测的GNN技术进行了全面的回顾,总结现有方法并为未来研究提供指导。我们首先提出了一个基于GNN适应不同阶段的 taxonomy,系统地将方法分为四个关键阶段:图构建、图建模、图信息处理和图输出。通过组织领域的方式,我们突出了在GNN管道每个阶段独特的挑战和考虑因素。此外,我们详细评估了各种最先进的(SOTA)GNN方法,确保公平比较的实验设置。这种严谨的分析产生了有关不同方法的优劣点的宝贵见解,成为研究者和从业者在这个领域的实验指南。最后,我们识别和讨论了几个有前途的研究方向,进一步推动该领域的发展,强调GNNs在RUL预测方面的革命性变革以及增强PHM策略的有效性的潜力。基准测试代码可以在GitHub上找到:https://github.com/。
https://arxiv.org/abs/2409.19629
The advancements of Large Language Models (LLMs) have decentralized the responsibility for the transparency of AI usage. Specifically, LLM users are now encouraged or required to disclose the use of LLM-generated content for varied types of real-world tasks. However, an emerging phenomenon, users' secret use of LLM, raises challenges in ensuring end users adhere to the transparency requirement. Our study used mixed-methods with an exploratory survey (125 real-world secret use cases reported) and a controlled experiment among 300 users to investigate the contexts and causes behind the secret use of LLMs. We found that such secretive behavior is often triggered by certain tasks, transcending demographic and personality differences among users. Task types were found to affect users' intentions to use secretive behavior, primarily through influencing perceived external judgment regarding LLM usage. Our results yield important insights for future work on designing interventions to encourage more transparent disclosure of the use of LLMs or other AI technologies.
大语言模型(LLMs)的进步已经使AI使用的透明度责任分散。具体来说,LLM用户现在被鼓励或要求披露LLM生成的内容在各种现实任务中的应用。然而,一种新兴现象,即用户对LLM的秘密使用,使确保最终用户遵守透明要求带来了挑战。我们的研究采用了一种混合方法,包括一个探索性调查(125个真实世界秘密使用案例报告)和300个用户的控制实验,调查了LLM秘密使用的原因和背景。我们发现,这种隐蔽行为通常是由某些任务触发的,超越了用户之间的 demographic 和个性差异。任务类型主要通过影响用户对LLM使用的外部评判来影响用户使用隐秘行为的意愿。我们的结果为未来在设计干预措施以鼓励更透明披露LLM或其他AI技术的使用提供了重要的见解。
https://arxiv.org/abs/2409.19450
The recent excitement around generative models has sparked a wave of proposals suggesting the replacement of human participation and labor in research and development--e.g., through surveys, experiments, and interviews--with synthetic research data generated by large language models (LLMs). We conducted interviews with 19 qualitative researchers to understand their perspectives on this paradigm shift. Initially skeptical, researchers were surprised to see similar narratives emerge in the LLM-generated data when using the interview probe. However, over several conversational turns, they went on to identify fundamental limitations, such as how LLMs foreclose participants' consent and agency, produce responses lacking in palpability and contextual depth, and risk delegitimizing qualitative research methods. We argue that the use of LLMs as proxies for participants enacts the surrogate effect, raising ethical and epistemological concerns that extend beyond the technical limitations of current models to the core of whether LLMs fit within qualitative ways of knowing.
近年来,关于生成模型的兴奋引起了关于在研究和开发中替换人类参与和劳动的一波建议,例如通过调查、实验和访谈等方式生成大型语言模型(LLMs)生成的合成研究数据。我们对19位定性研究人员进行了采访,以了解他们对这种范式转移的观点。一开始持怀疑态度的研究人员惊讶地发现,当使用访谈探针时,LLM生成的数据中出现了类似于的故事。然而,在接下来的对话中,他们逐渐指出了LLM的局限性,例如LLMs如何扼杀参与者的同意和自主性,产生缺乏可感知性和上下文深度的反应,以及可能使定性研究方法失去信誉。我们认为,将LLM用作参与者的代理的做法实施了一种替代效应,引发了超越现有技术限制的伦理和认识论问题,涉及到LLM是否适合作为定性方法的认识的核心。
https://arxiv.org/abs/2409.19430