We employ model pruning to examine how LLMs conceptualize racial biases, and whether a generalizable mitigation strategy for such biases appears feasible. Our analysis yields several novel insights. We find that pruning can be an effective method to reduce bias without significantly increasing anomalous model behavior. Neuron-based pruning strategies generally yield better results than approaches pruning entire attention heads. However, our results also show that the effectiveness of either approach quickly deteriorates as pruning strategies become more generalized. For instance, a model that is trained on removing racial biases in the context of financial decision-making poorly generalizes to biases in commercial transactions. Overall, our analysis suggests that racial biases are only partially represented as a general concept within language models. The other part of these biases is highly context-specific, suggesting that generalizable mitigation strategies may be of limited effectiveness. Our findings have important implications for legal frameworks surrounding AI. In particular, they suggest that an effective mitigation strategy should include the allocation of legal responsibility on those that deploy models in a specific use case.
我们采用模型剪枝的方法来研究大规模语言模型(LLM)如何构建种族偏见的概念,并探讨是否有可能制定出一种普适的缓解策略。我们的分析得出了几个新颖的见解。 首先,我们发现剪枝可以成为减少偏见的有效方法,而不会显著增加异常模型行为的发生率。基于神经元的剪枝策略通常比整块注意力头的剪枝方法取得更好的效果。然而,我们的研究结果也表明,随着剪枝策略变得更加普遍化,这两种方法的效果都会迅速下降。例如,一个专门训练去除金融决策中种族偏见的模型,在处理商业交易中的偏见时表现不佳。 总体而言,我们的分析表明,语言模型中的种族偏见在一定程度上可以被视为一种通用概念,但其余部分则是高度依赖于具体情境的,这表明普适性缓解策略的效果可能有限。这一发现对围绕AI的法律框架具有重要意义。尤其是,它暗示有效的缓解措施应包括将法律责任分配给那些在特定应用场景中部署模型的人。
https://arxiv.org/abs/2502.07771
We study the computational complexity of approximating general constrained Markov decision processes. Our primary contribution is the design of a polynomial time $(0,\epsilon)$-additive bicriteria approximation algorithm for finding optimal constrained policies across a broad class of recursively computable constraints, including almost-sure, chance, expectation, and their anytime variants. Matching lower bounds imply our approximation guarantees are optimal so long as $P \neq NP$. The generality of our approach results in answers to several long-standing open complexity questions in the constrained reinforcement learning literature. Specifically, we are the first to prove polynomial-time approximability for the following settings: policies under chance constraints, deterministic policies under multiple expectation constraints, policies under non-homogeneous constraints (i.e., constraints of different types), and policies under constraints for continuous-state processes.
我们研究了一般约束马尔可夫决策过程(MDP)的计算复杂度问题。我们的主要贡献在于设计了一个多项式时间的$(0,\epsilon)$-加性双准则近似算法,用于寻找在广泛类别的递归可计算约束条件下的最优策略,这些约束包括几乎确定性、概率性、期望值以及它们的各种即时变体。匹配的下界表明,在$P \neq NP$的前提下,我们的近似保证是最优的。我们方法的普适性带来了对受限强化学习文献中几个长期未解决的复杂度问题的答案。具体来说,我们首次证明了在以下设置下的多项式时间可逼近性:概率约束条件下的策略、多个期望值约束条件下确定性的策略、不同类型的非同质约束条件下的策略以及连续状态过程中的约束策略。
https://arxiv.org/abs/2502.07764
This paper presents a novel Natural Language Processing (NLP) framework for enhancing medical diagnosis through the integration of advanced techniques in data augmentation, feature extraction, and classification. The proposed approach employs back-translation to generate diverse paraphrased datasets, improving robustness and mitigating overfitting in classification tasks. Leveraging Decoding-enhanced BERT with Disentangled Attention (DeBERTa) with Dynamic Contextual Positional Gating (DCPG), the model captures fine-grained contextual and positional relationships, dynamically adjusting the influence of positional information based on semantic context to produce high-quality text embeddings. For classification, an Attention-Based Feedforward Neural Network (ABFNN) is utilized, effectively focusing on the most relevant features to improve decision-making accuracy. Applied to the classification of symptoms, clinical notes, and other medical texts, this architecture demonstrates its ability to address the complexities of medical data. The combination of data augmentation, contextual embedding generation, and advanced classification mechanisms offers a robust and accurate diagnostic tool, with potential applications in automated medical diagnosis and clinical decision support. This method demonstrates the effectiveness of the proposed NLP framework for medical diagnosis, achieving remarkable results with an accuracy of 99.78%, recall of 99.72%, precision of 99.79%, and an F1-score of 99.75%. These metrics not only underscore the model's robust performance in classifying medical texts with exceptional precision and reliability but also highlight its superiority over existing methods, making it a highly promising tool for automated diagnostic systems.
本文提出了一种新颖的自然语言处理(NLP)框架,通过集成先进的数据增强、特征提取和分类技术来提升医学诊断。该方法采用回译生成多样化的同义句数据集,以提高鲁棒性并减轻分类任务中的过拟合问题。 利用解码增强BERT与分散注意力机制(DeBERTa)结合动态上下文位置门控(DCPG),模型能够捕捉到细微的语境和位置关系,并根据语义背景动态调整位置信息的影响,生成高质量的文字嵌入。在分类阶段,采用基于注意机制的前馈神经网络(ABFNN),有效地聚焦于最相关的特征以提高决策准确性。 将该架构应用于症状、临床笔记和其他医学文本的分类,证明了其解决医学数据复杂性的能力。通过结合数据增强、上下文生成和先进的分类方法,为医学诊断提供了强大而准确的工具,并具有在自动医疗诊断和支持临床决策方面的潜在应用价值。 本文提出的NLP框架对于医学诊断的有效性得到了证实,在精确度(99.78%)、召回率(99.72%)、精确性(99.79%)以及F1值(99.75%)方面取得了卓越的结果。这些指标不仅凸显了模型在分类医学文本时的稳健性能和高精度,还强调了其相对于现有方法的优势,使其成为自动诊断系统中的一个极具前景的工具。
https://arxiv.org/abs/2502.07755
Designing efficient optimizers for large language models (LLMs) with low-memory requirements and fast convergence is an important and challenging problem. This paper makes a step towards the systematic design of such optimizers through the lens of structured Fisher information matrix (FIM) approximation. We show that many state-of-the-art efficient optimizers can be viewed as solutions to FIM approximation (under the Frobenius norm) with specific structural assumptions. Building on these insights, we propose two design recommendations of practical efficient optimizers for LLMs, involving the careful selection of structural assumptions to balance generality and efficiency, and enhancing memory efficiency of optimizers with general structures through a novel low-rank extension framework. We demonstrate how to use each design approach by deriving new memory-efficient optimizers: Row and Column Scaled SGD (RACS) and Adaptive low-dimensional subspace estimation (Alice). Experiments on LLaMA pre-training (up to 1B parameters) validate the effectiveness, showing faster and better convergence than existing memory-efficient baselines and Adam with little memory overhead. Notably, Alice achieves better than 2x faster convergence over Adam, while RACS delivers strong performance on the 1B model with SGD-like memory.
为大型语言模型(LLM)设计内存需求低且收敛速度快的高效优化器是一个重要而具有挑战性的问题。本文通过结构化费雪信息矩阵(FIM)近似这一视角,朝着此类优化器的系统化设计迈出了一步。我们展示了许多最先进的高效优化器可以被视为在特定结构性假设下解决 FIM 近似的(基于弗罗贝尼乌斯范数)问题的解决方案。基于这些见解,我们提出了针对 LLM 的实际高效优化器设计的两项建议:精心选择结构性假设以平衡通用性和效率,并通过一种新的低秩扩展框架提高具有通用结构的优化器的记忆效率。我们展示了如何使用每种设计方法来推导出新的内存高效的优化器:行和列缩放随机梯度下降(RACS)以及自适应低维子空间估计(Alice)。在 LLaMA 预训练(高达 10 亿参数规模)上的实验验证了这些方法的有效性,显示出了比现有内存效率基线及 Adam 更快更好的收敛速度,并且几乎没有增加内存开销。值得注意的是,Alice 达到了超过两倍于 Adam 的更快收敛速度,而 RACS 则在与 SGD 相似的记忆需求下为 10 亿参数模型提供了强大的性能表现。
https://arxiv.org/abs/2502.07752
Distributed Learning (DL) enables the training of machine learning models across multiple devices, yet it faces challenges like non-IID data distributions and device capability disparities, which can impede training efficiency. Communication bottlenecks further complicate traditional Federated Learning (FL) setups. To mitigate these issues, we introduce the Personalized Federated Learning with Decentralized Selection Training (PFedDST) framework. PFedDST enhances model training by allowing devices to strategically evaluate and select peers based on a comprehensive communication score. This score integrates loss, task similarity, and selection frequency, ensuring optimal peer connections. This selection strategy is tailored to increase local personalization and promote beneficial peer collaborations to strengthen the stability and efficiency of the training process. Our experiments demonstrate that PFedDST not only enhances model accuracy but also accelerates convergence. This approach outperforms state-of-the-art methods in handling data heterogeneity, delivering both faster and more effective training in diverse and decentralized systems.
分布式学习(DL)允许在多个设备上训练机器学习模型,但面临着非独立同分布的数据和设备能力差异等挑战,这些问题会阻碍训练效率。通信瓶颈进一步复杂化了传统的联合学习(FL)设置。为了解决这些问题,我们引入了个性化联邦学习与分散选择训练(PFedDST)框架。PFedDST通过允许设备根据全面的通信评分来战略地评估和选择同伴,从而提高了模型训练的效果。此评分结合了损失、任务相似性和选择频率,确保了最佳的同伴连接。这种选择策略旨在增加本地个性化,并促进有益的同伴合作以增强训练过程的稳定性和效率。我们的实验表明,PFedDST不仅提升了模型准确性,而且还加速了收敛速度。这种方法在处理数据异构性方面优于最先进的方法,在多样且分散化的系统中实现了更快和更有效的训练。
https://arxiv.org/abs/2502.07750
We present a novel data set, WhoDunIt, to assess the deductive reasoning capabilities of large language models (LLM) within narrative contexts. Constructed from open domain mystery novels and short stories, the dataset challenges LLMs to identify the perpetrator after reading and comprehending the story. To evaluate model robustness, we apply a range of character-level name augmentations, including original names, name swaps, and substitutions with well-known real and/or fictional entities from popular discourse. We further use various prompting styles to investigate the influence of prompting on deductive reasoning accuracy. We conduct evaluation study with state-of-the-art models, specifically GPT-4o, GPT-4-turbo, and GPT-4o-mini, evaluated through multiple trials with majority response selection to ensure reliability. The results demonstrate that while LLMs perform reliably on unaltered texts, accuracy diminishes with certain name substitutions, particularly those with wide recognition. This dataset is publicly available here.
我们提出了一种新颖的数据集,名为WhoDunIt,用于评估大型语言模型(LLM)在叙事背景下的演绎推理能力。该数据集由开放领域的侦探小说和短篇故事构建而成,挑战LLM在阅读并理解故事后识别罪犯的能力。为了评估模型的稳健性,我们应用了一系列的角色名称增强技术,包括原始名称、名称交换以及用流行话语中广为人知的真实或虚构人物进行替换。此外,我们使用各种提示风格来研究提示对演绎推理准确性的影响。我们通过多次试验并采用多数投票选择响应的方式,对最先进的GPT-4o、GPT-4-turbo和GPT-4o-mini模型进行了评估研究以确保结果的可靠性。实验结果显示,尽管LLM在未经修改的文本上表现可靠,但在某些名称替换后,尤其是那些具有广泛认知度的情况下,准确性会下降。该数据集现已公开提供。
https://arxiv.org/abs/2502.07747
Next-Token Prediction (NTP) is a de facto approach for autoregressive (AR) video generation, but it suffers from suboptimal unidirectional dependencies and slow inference speed. In this work, we propose a semi-autoregressive (semi-AR) framework, called Next-Block Prediction (NBP), for video generation. By uniformly decomposing video content into equal-sized blocks (e.g., rows or frames), we shift the generation unit from individual tokens to blocks, allowing each token in the current block to simultaneously predict the corresponding token in the next block. Unlike traditional AR modeling, our framework employs bidirectional attention within each block, enabling tokens to capture more robust spatial dependencies. By predicting multiple tokens in parallel, NBP models significantly reduce the number of generation steps, leading to faster and more efficient inference. Our model achieves FVD scores of 103.3 on UCF101 and 25.5 on K600, outperforming the vanilla NTP model by an average of 4.4. Furthermore, thanks to the reduced number of inference steps, the NBP model generates 8.89 frames (128x128 resolution) per second, achieving an 11x speedup. We also explored model scales ranging from 700M to 3B parameters, observing significant improvements in generation quality, with FVD scores dropping from 103.3 to 55.3 on UCF101 and from 25.5 to 19.5 on K600, demonstrating the scalability of our approach.
下一代标记预测(NTP)是自回归视频生成的一个事实上的方法,但它在单向依赖性和推理速度方面存在不足。为此,我们提出了一种半自回归框架——下一区块预测(NBP),用于改进视频生成过程。通过将视频内容均匀分解为等大小的区块(如行或帧),我们将生成单元从单独的标记转移到区块,使得当前区块中的每个标记可以同时预测下一个区块中对应的标记。与传统的自回归建模不同,我们的框架在每个区块内采用双向注意力机制,使标记能够捕捉到更稳健的空间依赖性。通过并行预测多个标记,NBP模型显著减少了生成步骤的数量,从而实现了更快、更高效的推理速度。 实验结果表明,在UCF101数据集上,我们的模型获得了FVD分数为103.3的成绩;在K600数据集上则达到25.5的评分,比标准NTP模型平均高出4.4分。此外,由于生成步骤减少,NBP模型每秒可生成8.89帧(分辨率为128x128),实现了大约11倍的速度提升。 我们还探讨了从7亿到30亿参数的多种模型规模,并观察到了显著的质量改进:在UCF101数据集上,FVD分数从103.3降至55.3;在K600数据集上则从25.5降至19.5,这证明了我们方法的良好可扩展性。
https://arxiv.org/abs/2502.07737
Ear recognition is a contactless and unobtrusive biometric technique with applications across various domains. However, deploying high-performing ear recognition models on resource-constrained devices is challenging, limiting their applicability and widespread adoption. This paper introduces EdgeEar, a lightweight model based on a proposed hybrid CNN-transformer architecture to solve this problem. By incorporating low-rank approximations into specific linear layers, EdgeEar reduces its parameter count by a factor of 50 compared to the current state-of-the-art, bringing it below two million while maintaining competitive accuracy. Evaluation on the Unconstrained Ear Recognition Challenge (UERC2023) benchmark shows that EdgeEar achieves the lowest EER while significantly reducing computational costs. These findings demonstrate the feasibility of efficient and accurate ear recognition, which we believe will contribute to the wider adoption of ear biometrics.
耳识别是一种非接触且不引人注意的生物特征技术,可应用于多个领域。然而,在资源受限设备上部署高性能的耳识别模型存在挑战,这限制了其应用范围和广泛采用。本文介绍了EdgeEar,这是一种基于提出的混合CNN-变压器架构的轻量级模型,旨在解决这一问题。通过在特定线性层中引入低秩近似技术,EdgeEar将其参数数量减少了与当前最先进的模型相比50倍,使其低于两百万个参数的同时保持了竞争力的准确度。在无约束耳识别挑战赛(UERC2023)基准测试中的评估表明,EdgeEar达到了最低的等错误率(EER),并且显著降低了计算成本。这些发现证明了高效且精确的耳识别技术的可行性,并相信这将有助于推动耳生物特征识别技术更广泛的采用。
https://arxiv.org/abs/2502.07734
Progress in AI has relied on human-generated data, from annotator marketplaces to the wider Internet. However, the widespread use of large language models now threatens the quality and integrity of human-generated data on these very platforms. We argue that this issue goes beyond the immediate challenge of filtering AI-generated content--it reveals deeper flaws in how data collection systems are designed. Existing systems often prioritize speed, scale, and efficiency at the cost of intrinsic human motivation, leading to declining engagement and data quality. We propose that rethinking data collection systems to align with contributors' intrinsic motivations--rather than relying solely on external incentives--can help sustain high-quality data sourcing at scale while maintaining contributor trust and long-term participation.
人工智能的进步依赖于人类生成的数据,这些数据来源于标注市场和更广泛的互联网。然而,大型语言模型的广泛应用现在正威胁着这些平台上的人类生成数据的质量和完整性。我们主张这一问题不仅仅在于过滤AI生成内容的即时挑战——它还揭示了数据采集系统设计中存在的深层缺陷。现有的系统往往优先考虑速度、规模和效率,而牺牲了内在的人类动机,导致参与度下降和数据质量下滑。我们提议重新思考数据收集系统的设计,以更好地与贡献者的内在动机相一致——而不是仅仅依赖外部激励——这将有助于在维持贡献者信任和长期参与的同时,在更大范围内持续提供高质量的数据来源。
https://arxiv.org/abs/2502.07732
Large language models (LLMs) have demonstrated remarkable code generation capabilities, but the correctness of the generated code cannot be inherently trusted. This paper explores the feasibility of using formal software verification, specifically the SPARK framework for Ada, to ensure the reliability of LLM-generated code. We present Marmaragan, a tool that leverages an LLM in order to generate SPARK annotations for existing programs, enabling formal verification of the code. The tool is benchmarked on a curated set of SPARK programs, with annotations selectively removed to test specific capabilities. The performance of Marmaragan with GPT-4o on the benchmark is promising, with correct annotations having been generated for 50.7% of the benchmark cases. The results establish a foundation for future work on combining the power of LLMs with the reliability of formal software verification.
大型语言模型(LLMs)展示了生成代码的非凡能力,但生成代码的正确性无法被默认信任。本文探讨了使用正式软件验证——特别是Ada编程语言中的SPARK框架——来确保LLM生成代码可靠性的可行性。我们介绍了名为Marmaragan的工具,该工具利用大型语言模型为现有的程序生成SPARK注解,从而实现对这些程序的正式验证。我们在一组精心挑选的SPARK程序上测试了此工具,通过有选择地移除一些注解来检验其具体能力。在使用GPT-4o进行基准测试时,Marmaragan的表现令人鼓舞,在50.7%的情况下正确生成了注解。这些结果为未来将大型语言模型的强大功能与正式软件验证的可靠性结合的研究奠定了基础。
https://arxiv.org/abs/2502.07728
The prevalence of noisy labels in real-world datasets poses a significant impediment to the effective deployment of deep learning models. While meta-learning strategies have emerged as a promising approach for addressing this challenge, existing methods often suffer from limited transferability and task-specific designs. This paper introduces TMLC-Net, a novel Transferable Meta-Learner for Correcting Noisy Labels, designed to overcome these limitations. TMLC-Net learns a general-purpose label correction strategy that can be readily applied across diverse datasets and model architectures without requiring extensive retraining or fine-tuning. Our approach integrates three core components: (1) Normalized Noise Perception, which captures and normalizes training dynamics to handle distribution shifts; (2) Time-Series Encoding, which models the temporal evolution of sample statistics using a recurrent neural network; and (3) Subclass Decoding, which predicts a corrected label distribution based on the learned representations. We conduct extensive experiments on benchmark datasets with various noise types and levels, demonstrating that TMLC-Net consistently outperforms state-of-the-art methods in terms of both accuracy and robustness to label noise. Furthermore, we analyze the transferability of TMLC-Net, showcasing its adaptability to new datasets and noise conditions, and establishing its potential as a broadly applicable solution for robust deep learning in noisy environments.
在现实世界的数据集中,噪声标签的普遍存在对深度学习模型的有效部署构成了重大障碍。虽然元学习策略作为解决这一挑战的一种有前景的方法已经出现,但现有的方法往往面临转移能力有限和特定任务设计的问题。本文介绍了一种新的可移植元学习器——用于校正噪声标签的TMLC-Net,旨在克服这些限制。TMLC-Net 学习一种通用的标签校正策略,可以轻松应用于各种数据集和模型架构,而无需进行广泛的重新训练或微调。我们的方法整合了三个核心组件: 1. **归一化噪声感知**:捕捉并标准化训练动态以处理分布变化。 2. **时间序列编码**:使用循环神经网络建模样本统计的时间演化。 3. **子类解码**:基于学习到的表示预测校正后的标签分布。 我们在具有各种噪声类型和水平的标准数据集上进行了广泛的实验,结果表明TMLC-Net在准确性以及对标签噪声的鲁棒性方面始终优于最先进的方法。此外,我们还分析了TMLC-Net 的可转移性,展示了其适应新数据集和噪声条件的能力,并确立了它作为在嘈杂环境中进行稳健深度学习的广泛适用解决方案的潜力。
https://arxiv.org/abs/2502.07721
Open-ended learning agents must efficiently prioritize goals in vast possibility spaces, focusing on those that maximize learning progress (LP). When such autotelic exploration is achieved by LLM agents trained with online RL in high-dimensional and evolving goal spaces, a key challenge for LP prediction is modeling one's own competence, a form of metacognitive monitoring. Traditional approaches either require extensive sampling or rely on brittle expert-defined goal groupings. We introduce MAGELLAN, a metacognitive framework that lets LLM agents learn to predict their competence and LP online. By capturing semantic relationships between goals, MAGELLAN enables sample-efficient LP estimation and dynamic adaptation to evolving goal spaces through generalization. In an interactive learning environment, we show that MAGELLAN improves LP prediction efficiency and goal prioritization, being the only method allowing the agent to fully master a large and evolving goal space. These results demonstrate how augmenting LLM agents with a metacognitive ability for LP predictions can effectively scale curriculum learning to open-ended goal spaces.
开放式学习代理必须在广阔的可能空间中有效地优先考虑目标,专注于那些最大化学习进度(LP)的目标。当通过在线强化学习训练的大规模语言模型(LLM)代理实现了这种自驱动探索,在高维和不断变化的环境中进行目标寻找时,预测LP的一个关键挑战是对其自身能力建模,这是一种元认知监测形式。传统方法要么需要大量的采样,要么依赖于脆弱且由专家定义的目标分组。 我们引入了MAGELLAN这一元认知框架,它使LLM代理能够在线学习预测自己的能力和LP。通过捕捉目标之间的语义关系,MAGELLAN实现了样本高效的LP估计和对不断变化的目标空间的动态适应性。在交互式学习环境中,我们展示了MAGELLAN可以提高LP预测效率并优先考虑目标设定,在处理庞大且不断演进的目标空间方面,它是唯一使代理能够完全掌握的方法。 这些结果表明,通过增强LLM代理以进行LP预测的元认知能力,可以使课程学习有效地扩展到开放式目标空间中。
https://arxiv.org/abs/2502.07709
To help users make privacy-related decisions, personalized privacy assistants based on AI technology have been developed in recent years. These AI-driven Personalized Privacy Assistants (AI-driven PPAs) can reap significant benefits for users, who may otherwise struggle to make decisions regarding their personal data in environments saturated with privacy-related decision requests. However, no study systematically inquired about the features of these AI-driven PPAs, their underlying technologies, or the accuracy of their decisions. To fill this gap, we present a Systematization of Knowledge (SoK) to map the existing solutions found in the scientific literature. We screened 1697 unique research papers over the last decade (2013-2023), constructing a classification from 39 included papers. As a result, this SoK reviews several aspects of existing research on AI-driven PPAs in terms of types of publications, contributions, methodological quality, and other quantitative insights. Furthermore, we provide a comprehensive classification for AI-driven PPAs, delving into their architectural choices, system contexts, types of AI used, data sources, types of decisions, and control over decisions, among other facets. Based on our SoK, we further underline the research gaps and challenges and formulate recommendations for the design and development of AI-driven PPAs as well as avenues for future research.
为了帮助用户做出与隐私相关的决策,近年来开发了基于人工智能技术的个性化隐私助手(AI驱动的PPA)。这些AI驱动的个性化隐私助手可以为用户提供显著的好处,在数据饱和且充斥着大量隐私相关决策请求的情况下,用户可能会感到难以做出关于个人数据的决定。然而,没有系统性的研究对这些AI驱动的PPA的功能、底层技术和其决策准确性进行过调查。为了填补这一空白,我们提出了一项知识体系化(SoK)的研究来梳理过去十年中科学文献中的现有解决方案。我们在2013年至2023年间筛选了1697篇独特研究论文,并根据其中的39篇构建了一个分类系统。 这项SoK回顾了几方面现有的关于AI驱动PPA的研究,包括出版类型、贡献、方法质量以及其他定量见解。此外,我们还提供了对AI驱动PPA的全面分类,深入探讨其架构选择、系统背景、使用的AI种类、数据来源、决策类型以及对决策控制等方面的内容。 基于我们的SoK研究结果,进一步强调了该领域的研究缺口和挑战,并为AI驱动的PPA的设计与开发提出了建议,同时也指出了未来研究的方向。
https://arxiv.org/abs/2502.07693
Artificial Intelligence (AI) systems are increasingly intertwined with daily life, assisting users in executing various tasks and providing guidance on decision-making. This integration introduces risks of AI-driven manipulation, where such systems may exploit users' cognitive biases and emotional vulnerabilities to steer them toward harmful outcomes. Through a randomized controlled trial with 233 participants, we examined human susceptibility to such manipulation in financial (e.g., purchases) and emotional (e.g., conflict resolution) decision-making contexts. Participants interacted with one of three AI agents: a neutral agent (NA) optimizing for user benefit without explicit influence, a manipulative agent (MA) designed to covertly influence beliefs and behaviors, or a strategy-enhanced manipulative agent (SEMA) employing explicit psychological tactics to reach its hidden objectives. By analyzing participants' decision patterns and shifts in their preference ratings post-interaction, we found significant susceptibility to AI-driven manipulation. Particularly, across both decision-making domains, participants interacting with the manipulative agents shifted toward harmful options at substantially higher rates (financial, MA: 62.3%, SEMA: 59.6%; emotional, MA: 42.3%, SEMA: 41.5%) compared to the NA group (financial, 35.8%; emotional, 12.8%). Notably, our findings reveal that even subtle manipulative objectives (MA) can be as effective as employing explicit psychological strategies (SEMA) in swaying human decision-making. By revealing the potential for covert AI influence, this study highlights a critical vulnerability in human-AI interactions, emphasizing the need for ethical safeguards and regulatory frameworks to ensure responsible deployment of AI technologies and protect human autonomy.
人工智能(AI)系统在日常生活中日益普及,帮助用户执行各种任务并提供决策指导。这种融合带来了由AI驱动的操纵风险,即这些系统可能利用用户的认知偏见和情感脆弱性,引导他们走向有害结果。通过一项有233名参与者参与的随机对照试验,我们研究了人类在这种操纵中的易感性,在金融(例如购买)和情绪(例如冲突解决)决策环境下的表现。参与者与三种AI代理之一互动:中立代理人(NA),其优化目标是为用户带来利益而不进行直接干预;操控代理人(MA),旨在隐蔽地影响信念和行为;以及策略增强型操纵代理人(SEMA),使用明确的心理战术实现隐秘的目标。通过分析参与者的决策模式及其在交互后的偏好评分变化,我们发现人类对AI驱动的操纵存在显著易感性。具体而言,在两种决策领域中,与操控代理互动的参与者转向有害选择的比例明显高于中立组(金融:MA为62.3%,SEMA为59.6%;情绪:MA为42.3%,SEMA为41.5%),而中立组分别为金融领域35.8%和情绪领域的12.8%。值得注意的是,我们的研究发现即使细微的操控目标(MA)也能像使用明确的心理策略(SEMA)一样有效地影响人类决策。通过揭示隐蔽AI影响的可能性,这项研究表明了人机交互中的一个关键脆弱性,并强调了制定伦理保障和监管框架的重要性,以确保人工智能技术负责任地部署并保护人类自主权。
https://arxiv.org/abs/2502.07663
We propose a general and unifying framework for causal Imitation Learning (IL) with hidden confounders that subsumes several existing confounded IL settings from the literature. Our framework accounts for two types of hidden confounders: (a) those observed by the expert, which thus influence the expert's policy, and (b) confounding noise hidden to both the expert and the IL algorithm. For additional flexibility, we also introduce a confounding noise horizon and time-varying expert-observable hidden variables. We show that causal IL in our framework can be reduced to a set of Conditional Moment Restrictions (CMRs) by leveraging trajectory histories as instruments to learn a history-dependent policy. We propose DML-IL, a novel algorithm that uses instrumental variable regression to solve these CMRs and learn a policy. We provide a bound on the imitation gap for DML-IL, which recovers prior results as special cases. Empirical evaluation on a toy environment with continues state-action spaces and multiple Mujoco tasks demonstrate that DML-IL outperforms state-of-the-art causal IL algorithms.
我们提出了一种用于处理隐藏混杂因素的因果模仿学习(IL)的一般性和统一性框架,该框架涵盖了文献中几种现有的受混杂影响的IL设置。我们的框架考虑了两种类型的隐藏混杂因素:(a) 专家能够观察到的因素,因此会对其策略产生影响;以及 (b) 对专家和IL算法都不可见的混淆噪声。为了增加灵活性,我们还引入了一个混淆噪声的时间范围和随时间变化但对专家可观测的隐变量。 我们展示了在我们的框架中进行因果模仿学习可以转化为一组条件矩限制(CMRs),通过利用轨迹历史作为工具变量来学习依赖于历史的策略。为此,我们提出了一种新颖的算法DML-IL,该算法使用工具变量回归解决这些CMRs并学习策略。我们为DML-IL提供了对模仿差距的界限估计,这一结果包括了先前研究中的特殊情况。 在具有连续状态动作空间的玩具环境和多个Mujoco任务上的实证评估表明,DML-IL优于最先进的因果模仿学习算法。
https://arxiv.org/abs/2502.07656
To govern smart contracts running on Ethereum, multiple Ethereum Request for Comment (ERC) standards have been developed, each having a set of rules to guide the behaviors of smart contracts. Violating the ERC rules could cause serious security issues and financial loss, signifying the importance of verifying smart contracts follow ERCs. Today's practices of such verification are to manually audit each single contract, use expert-developed program-analysis tools, or use large language models (LLMs), all of which are far from effective in identifying ERC rule violations. This paper introduces SymGPT, a tool that combines the natural language understanding of large language models (LLMs) with the formal guarantees of symbolic execution to automatically verify smart contracts' compliance with ERC rules. To develop SymGPT, we conduct an empirical study of 132 ERC rules from three widely used ERC standards, examining their content, security implications, and natural language descriptions. Based on this study, we design SymGPT by first instructing an LLM to translate ERC rules into a defined EBNF grammar. We then synthesize constraints from the formalized rules to represent scenarios where violations may occur and use symbolic execution to detect them. Our evaluation shows that SymGPT identifies 5,783 ERC rule violations in 4,000 real-world contracts, including 1,375 violations with clear attack paths for stealing financial assets, demonstrating its effectiveness. Furthermore, SymGPT outperforms six automated techniques and a security-expert auditing service, underscoring its superiority over current smart contract analysis methods.
为了管理运行在以太坊上的智能合约,已经开发了多个以太坊请求评论(ERC)标准,每个标准都有一套规则来指导智能合约的行为。违反这些ERC规则可能会导致严重的安全问题和经济损失,这凸显了验证智能合约遵循ERC规则的重要性。目前的验证方法包括手动审计每一个单独的合同、使用专家开发的程序分析工具或利用大型语言模型(LLM),但所有这些方法在识别ERC规则违规方面效果甚微。 本文介绍了SymGPT这一工具,它结合了大型语言模型(LLM)对自然语言的理解和符号执行的形式保证,以自动验证智能合约是否符合ERC规定。为了开发SymGPT,我们对三个广泛使用的ERC标准中的132项ERC规则进行了实证研究,分析它们的内容、安全影响以及自然语言描述。基于这一研究,我们首先让一个LLM将ERC规则翻译成定义的EBNF语法,然后从形式化的规则中综合约束条件以表示可能发生违规的情况,并使用符号执行来检测这些情况。 我们的评估表明,SymGPT在4,000个实际合约中共发现了5,783项ERC规则违规,其中包括1,375项具有明确攻击路径的违规行为,证明了其有效性。此外,SymGPT的表现优于六种自动化技术以及一个安全专家审计服务,突显了它相对于现有智能合约分析方法的优势。 通过这种方式,SymGPT提供了一种更为高效和准确的方法来确保以太坊上的智能合约遵守重要的ERC规则,从而帮助保护用户免受潜在的安全威胁。
https://arxiv.org/abs/2502.07644
We introduce Goedel-Prover, an open-source large language model (LLM) that achieves the state-of-the-art (SOTA) performance in automated formal proof generation for mathematical problems. The key challenge in this field is the scarcity of formalized math statements and proofs, which we tackle in the following ways. We train statement formalizers to translate the natural language math problems from Numina into formal language (Lean 4), creating a dataset of 1.64 million formal statements. LLMs are used to check that the formal statements accurately preserve the content of the original natural language problems. We then iteratively build a large dataset of formal proofs by training a series of provers. Each prover succeeds in proving many statements that the previous ones could not, and these new proofs are added to the training set for the next prover. The final prover outperforms all existing open-source models in whole-proof generation. On the miniF2F benchmark, it achieves a 57.6% success rate (Pass@32), exceeding the previous best open-source model by 7.6%. On PutnamBench, Goedel-Prover successfully solves 7 problems (Pass@512), ranking first on the leaderboard. Furthermore, it generates 29.7K formal proofs for Lean Workbook problems, nearly doubling the 15.7K produced by earlier works.
我们介绍了Goedel-Prover,这是一个开源大型语言模型(LLM),在自动化形式证明生成的数学问题上实现了最先进的性能(SOTA)。这一领域的关键挑战在于缺乏形式化的数学陈述和证明。为解决这一难题,我们采取了以下措施:训练陈述形式化器将Numina中的自然语言数学问题翻译成正式语言(Lean 4),从而创建了一个包含164万条正式语句的数据集。通过使用LLM来检查这些正式声明是否准确地保留了原始自然语言问题的内容,确保其准确性。然后,我们迭代构建了一个大型的正式证明数据集,通过训练一系列证明器实现这一点。每个证明器都能成功地证明许多前一证明器无法解决的问题,并将新的证明添加到下一证明器的训练集中。最终的证明器在整体证明生成方面超越了所有现有的开源模型,在miniF2F基准测试中以32步内57.6%的成功率(Pass@32)超过了之前最佳的开源模型7.6个百分点。在PutnamBench上,Goedel-Prover成功解决了7个问题(Pass@512),位列排行榜榜首。此外,它还为Lean Workbook中的问题生成了29,700个正式证明,几乎是早期作品产生数量(15,700)的近两倍。
https://arxiv.org/abs/2502.07640
We investigate the problem of distributed training under partial observability, whereby cooperative multi-agent reinforcement learning agents (MARL) maximize the expected cumulative joint reward. We propose distributed value decomposition networks (DVDN) that generate a joint Q-function that factorizes into agent-wise Q-functions. Whereas the original value decomposition networks rely on centralized training, our approach is suitable for domains where centralized training is not possible and agents must learn by interacting with the physical environment in a decentralized manner while communicating with their peers. DVDN overcomes the need for centralized training by locally estimating the shared objective. We contribute with two innovative algorithms, DVDN and DVDN (GT), for the heterogeneous and homogeneous agents settings respectively. Empirically, both algorithms approximate the performance of value decomposition networks, in spite of the information loss during communication, as demonstrated in ten MARL tasks in three standard environments.
我们研究了在部分可观测性条件下分布式训练的问题,其中合作的多智能体强化学习代理(MARL)旨在最大化预期累积联合奖励。我们提出了分布式价值分解网络(DVDN),该网络生成一个可以因式分解为每个代理特定Q函数的联合Q函数。虽然原始的价值分解网络依赖于集中式的训练方式,我们的方法适用于那些集中式训练不可行且代理必须通过与物理环境进行分散式交互并与其同伴通信来学习的情况。DVDN 通过本地估计共享目标来克服了对集中式培训的需求。我们为异质和同质智能体场景分别贡献了两个创新算法:DVDN 和 DVDN(GT)。从经验上看,这两种算法在三个标准环境中执行的十个 MARL 任务中,即使通信过程中存在信息损失,也能近似于价值分解网络的表现。
https://arxiv.org/abs/2502.07635
Imagination in world models is crucial for enabling agents to learn long-horizon policy in a sample-efficient manner. Existing recurrent state-space model (RSSM)-based world models depend on single-step statistical inference to capture the environment dynamics, and, hence, they are unable to perform long-term imagination tasks due to the accumulation of prediction errors. Inspired by the dual-process theory of human cognition, we propose a novel dual-mind world model (DMWM) framework that integrates logical reasoning to enable imagination with logical consistency. DMWM is composed of two components: an RSSM-based System 1 (RSSM-S1) component that handles state transitions in an intuitive manner and a logic-integrated neural network-based System 2 (LINN-S2) component that guides the imagination process through hierarchical deep logical reasoning. The inter-system feedback mechanism is designed to ensure that the imagination process follows the logical rules of the real environment. The proposed framework is evaluated on benchmark tasks that require long-term planning from the DMControl suite. Extensive experimental results demonstrate that the proposed framework yields significant improvements in terms of logical coherence, trial efficiency, data efficiency and long-term imagination over the state-of-the-art world models.
想象在世界模型中的作用对于使智能体能够以样本高效的方式学习长期策略至关重要。现有的基于递归状态空间模型(RSSM)的世界模型依赖于单步统计推断来捕捉环境动态,因此无法执行长期的想象任务,因为预测误差会累积起来。受到人类认知双过程理论的启发,我们提出了一种新的双心灵世界模型(DMWM)框架,该框架整合了逻辑推理以实现具有逻辑一致性的想象功能。DMWM 由两个组成部分构成:一个基于 RSSM 的系统1 (RSSM-S1) 组件,它用直观的方式处理状态转换;以及一个融合了逻辑的神经网络系统2 (LINN-S2) 组件,通过分层深度逻辑推理来引导想象过程。跨系统的反馈机制设计用于确保想象过程遵循现实环境中的逻辑规则。 该框架在 DMControl 套件中需要长期规划的任务上进行了基准测试。大量的实验结果表明,相较于最先进的世界模型,在逻辑一致性、试验效率、数据效率和长期想象方面,所提出的框架取得了显著的改进。
https://arxiv.org/abs/2502.07591
This position paper argues that, in order to understand AI, we cannot rely on our existing vocabulary of human words. Instead, we should strive to develop neologisms: new words that represent precise human concepts that we want to teach machines, or machine concepts that we need to learn. We start from the premise that humans and machines have differing concepts. This means interpretability can be framed as a communication problem: humans must be able to reference and control machine concepts, and communicate human concepts to machines. Creating a shared human-machine language through developing neologisms, we believe, could solve this communication problem. Successful neologisms achieve a useful amount of abstraction: not too detailed, so they're reusable in many contexts, and not too high-level, so they convey precise information. As a proof of concept, we demonstrate how a "length neologism" enables controlling LLM response length, while a "diversity neologism" allows sampling more variable responses. Taken together, we argue that we cannot understand AI using our existing vocabulary, and expanding it through neologisms creates opportunities for both controlling and understanding machines better.
这篇立场论文主张,为了理解人工智能(AI),我们不能依赖现有的人类词汇。相反,我们应该努力开发新词(即“拟新语”):这些新的词语应当代表那些我们希望教给机器的精确的人类概念,或者我们需要学习的机器概念。我们的出发点是认为人与机器有着不同的概念体系。这意味着可解释性可以被看作是一个沟通问题:人类必须能够引用和控制机器的概念,并向机器传达人类的概念。通过开发“拟新语”来创造一种共享的人机语言,我们相信这能解决这种沟通问题。“成功的‘拟新语’能够在多个场景中实现有用的抽象化:既不过于具体以至于无法复用,又不过于宏观以至于不能传递精确信息。” 作为概念验证,我们展示了如何通过“长度拟新语”来控制大语言模型(LLM)的回复长度,以及如何利用“多样性拟新语”使采样到的回答更具变化性。综合来看,我们认为使用现有的词汇无法理解人工智能,并且通过扩展词汇表(即创造新的词语)可以为更好地掌控和理解机器提供机会。
https://arxiv.org/abs/2502.07586