With advances in diffusion models, image generation has shown significant performance improvements. This raises concerns about the potential abuse of image generation, such as the creation of explicit or violent images, commonly referred to as Not Safe For Work (NSFW) content. To address this, the Stable Diffusion model includes several safety checkers to censor initial text prompts and final output images generated from the model. However, recent research has shown that these safety checkers have vulnerabilities against adversarial attacks, allowing them to generate NSFW images. In this paper, we find that these adversarial attacks are not robust to small changes in text prompts or input latents. Based on this, we propose CROPS (Circular or RandOm Prompts for Safety), a model-agnostic framework that easily defends against adversarial attacks generating NSFW images without requiring additional training. Moreover, we develop an approach that utilizes one-step diffusion models for efficient NSFW detection (CROPS-1), further reducing computational resources. We demonstrate the superiority of our method in terms of performance and applicability.
随着扩散模型的进步,图像生成技术在性能上取得了显著提升。然而,这也引发了对可能滥用图像生成的担忧,例如创建色情或暴力内容(通常称为不适合工作场合的内容,即NSFW)。为解决这一问题,Stable Diffusion 模型包含了几种安全检查机制,用于审查初始的文字提示和模型生成的最终输出图像。然而,近期研究显示这些安全检查器容易受到对抗性攻击的影响,使得它们在生成 NSFW 图像时失效。 在这篇论文中,我们发现这些对抗性攻击对于文本提示或输入潜在变量(latents)的小变化并不稳定。基于这一观察,我们提出了一种称为 CROPS(Circular or RandOm Prompts for Safety)的模型无关框架,可以轻松地防御生成 NSFW 图像的对抗性攻击,并且无需进行额外训练即可实现。此外,我们还开发了一种利用一步扩散模型进行高效 NSFW 检测的方法(CROPS-1),进一步减少了计算资源需求。 我们的方法在性能和应用范围方面都表现出了优越性。
https://arxiv.org/abs/2501.05359
Cloud removal plays a crucial role in enhancing remote sensing image analysis, yet accurately reconstructing cloud-obscured regions remains a significant challenge. Recent advancements in generative models have made the generation of realistic images increasingly accessible, offering new opportunities for this task. Given the conceptual alignment between image generation and cloud removal tasks, generative models present a promising approach for addressing cloud removal in remote sensing. In this work, we propose a deep transfer learning approach built on a generative adversarial network (GAN) framework to explore the potential of the novel masked autoencoder (MAE) image reconstruction model in cloud removal. Due to the complexity of remote sensing imagery, we further propose using a patch-wise discriminator to determine whether each patch of the image is real or not. The proposed reconstructive transfer learning approach demonstrates significant improvements in cloud removal performance compared to other GAN-based methods. Additionally, whilst direct comparisons with some of the state-of-the-art cloud removal techniques are limited due to unclear details regarding their train/test data splits, the proposed model achieves competitive results based on available benchmarks.
云去除在增强遥感图像分析中扮演着关键角色,但准确重构被云遮挡的区域仍然是一个重大挑战。近年来,生成模型的进步使得生成逼真的图像变得更加容易,为解决这一问题提供了新的机会。鉴于图像生成和云去除任务之间的概念一致性,生成模型为处理遥感中的云去除提供了一种有前景的方法。在本研究中,我们提出了一种基于生成对抗网络(GAN)框架的深度迁移学习方法,旨在探索新颖的掩码自动编码器(MAE)图像重构模型在云去除方面的潜力。鉴于遥感影像的复杂性,我们进一步提议采用分块鉴别器来判断图像中的每个区域是真是假。所提出的重构迁移学习方法在与其它基于GAN的方法相比时,在云去除性能上显示出显著改进。此外,尽管由于训练/测试数据划分细节不明确而限制了直接与一些最先进的云去除技术进行比较,但根据可用的基准来看,该模型实现了具有竞争力的结果。
https://arxiv.org/abs/2501.05265
Being a form of biometric identification, the security of the speaker identification (SID) system is of utmost importance. To better understand the robustness of SID systems, we aim to perform more realistic attacks in SID, which are challenging for both humans and machines to detect. In this study, we propose DiffAttack, a novel timbre-reserved adversarial attack approach that exploits the capability of a diffusion-based voice conversion (DiffVC) model to generate adversarial fake audio with distinct target speaker attribution. By introducing adversarial constraints into the generative process of the diffusion-based voice conversion model, we craft fake samples that effectively mislead target models while preserving speaker-wise characteristics. Specifically, inspired by the use of randomly sampled Gaussian noise in conventional adversarial attacks and diffusion processes, we incorporate adversarial constraints into the reverse diffusion process. These constraints subtly guide the reverse diffusion process toward aligning with the target speaker distribution. Our experiments on the LibriTTS dataset indicate that DiffAttack significantly improves the attack success rate compared to vanilla DiffVC and other methods. Moreover, objective and subjective evaluations demonstrate that introducing adversarial constraints does not compromise the speech quality generated by the DiffVC model.
作为生物识别身份验证的一种形式,语音识别(SID)系统的安全性至关重要。为了更好地理解SID系统的鲁棒性,我们旨在进行更加现实的攻击测试,这些攻击对人类和机器来说都难以检测。在这项研究中,我们提出了一种名为DiffAttack的新颖音色保留对抗性攻击方法,该方法利用基于扩散的语音转换(DiffVC)模型的能力来生成具有明显目标说话人属性的虚假音频。通过在基于扩散的语音转换模型的生成过程中引入对抗性约束条件,我们可以制作出能够有效误导目标模型同时保持说话人特性的假样本。 具体来说,受到传统对抗攻击中随机采样的高斯噪声以及扩散过程中的启发,我们在反向扩散过程中融入了对抗性约束。这些约束微妙地引导反向扩散过程与目标说话人的分布对齐。我们使用LibriTTS数据集进行的实验表明,DiffAttack在攻击成功率上显著优于原始DiffVC和其他方法。此外,客观和主观评估结果也显示,在DiffVC模型生成语音的过程中引入对抗性约束不会损害所生成的语音质量。
https://arxiv.org/abs/2501.05127
Adversarial attacks are allegedly unnoticeable. Prior studies have designed attack noticeability measures on graphs, primarily using statistical tests to compare the topology of original and (possibly) attacked graphs. However, we observe two critical limitations in the existing measures. First, because the measures rely on simple rules, attackers can readily enhance their attacks to bypass them, reducing their attack "noticeability" and, yet, maintaining their attack performance. Second, because the measures naively leverage global statistics, such as degree distributions, they may entirely overlook attacks until severe perturbations occur, letting the attacks be almost "totally unnoticeable." To address the limitations, we introduce HideNSeek, a learnable measure for graph attack noticeability. First, to mitigate the bypass problem, HideNSeek learns to distinguish the original and (potential) attack edges using a learnable edge scorer (LEO), which scores each edge on its likelihood of being an attack. Second, to mitigate the overlooking problem, HideNSeek conducts imbalance-aware aggregation of all the edge scores to obtain the final noticeability score. Using six real-world graphs, we empirically demonstrate that HideNSeek effectively alleviates the observed limitations, and LEO (i.e., our learnable edge scorer) outperforms eleven competitors in distinguishing attack edges under five different attack methods. For an additional application, we show that LEO boost the performance of robust GNNs by removing attack-like edges.
对抗攻击据称是难以察觉的。之前的研究设计了用于图上的攻击可检测性度量,主要通过统计测试来比较原图与(可能被攻击过的)图之间的拓扑结构差异。然而,我们观察到现有度量存在两个关键限制。首先,由于这些度量依赖于简单的规则,因此攻击者可以很容易地改进他们的攻击方法以规避这些度量,从而降低其攻击的“可检测性”,同时仍保持其攻击效果不变。其次,因为这些度量简单地利用了全局统计信息(如度分布),它们可能会完全忽略掉攻击行为直到发生严重扰动时才会被注意到,使得攻击几乎变得“完全不可察觉”。为了解决这些问题,我们提出了HideNSeek这一可学习的图攻击可检测性度量方法。首先,为了缓解绕过问题,HideNSeek通过一个可学习边评分器(LEO)来区分原边和可能受到攻击的边,并给每条边打分以确定其是否为攻击边;其次,为了缓解忽略问题,HideNSeek进行不平衡感知聚合所有边分数以获得最终的可检测性分数。使用六种真实世界中的图数据集,我们实证表明HideNSeek有效减轻了观察到的问题,并且LEO(即我们的可学习边评分器)在五种不同的攻击方法下优于十一种竞争对手,在区分攻击边方面表现更佳。作为额外的应用示例,我们还展示了LEO通过移除类似攻击的边来提高鲁棒GNNs性能的能力。
https://arxiv.org/abs/2501.05015
Deep learning models have demonstrated remarkable performance across various computer vision tasks, yet their vulnerability to distribution shifts remains a critical challenge. Despite sophisticated neural network architectures, existing models often struggle to maintain consistent performance when confronted with Out-of-Distribution (OOD) samples, including natural corruptions, adversarial perturbations, and anomalous patterns. We introduce LayerMix, an innovative data augmentation approach that systematically enhances model robustness through structured fractal-based image synthesis. By meticulously integrating structural complexity into training datasets, our method generates semantically consistent synthetic samples that significantly improve neural network generalization capabilities. Unlike traditional augmentation techniques that rely on random transformations, LayerMix employs a structured mixing pipeline that preserves original image semantics while introducing controlled variability. Extensive experiments across multiple benchmark datasets, including CIFAR-10, CIFAR-100, ImageNet-200, and ImageNet-1K demonstrate LayerMixs superior performance in classification accuracy and substantially enhances critical Machine Learning (ML) safety metrics, including resilience to natural image corruptions, robustness against adversarial attacks, improved model calibration and enhanced prediction consistency. LayerMix represents a significant advancement toward developing more reliable and adaptable artificial intelligence systems by addressing the fundamental challenges of deep learning generalization. The code is available at this https URL.
深度学习模型在各种计算机视觉任务中表现出卓越的性能,但它们面对分布变化时的脆弱性仍然是一个关键挑战。尽管复杂的神经网络架构已经存在,现有的模型在遇到出界分布(OOD)样本时往往难以保持一致的性能表现,这些OOD样本包括自然损坏、对抗性扰动和异常模式等。 我们介绍了一种创新的数据增强方法——LayerMix,它通过基于分形结构的系统化图像合成来提高模型的鲁棒性。通过在训练数据集中精心整合结构性复杂度,我们的方法生成了语义一致的合成样本,从而显著提升了神经网络的泛化能力。与依赖于随机变换的传统增广技术不同,LayerMix采用了一种有组织的混合流水线,在保留原始图像语义的同时引入可控变化。 广泛的实验显示,LayerMix在CIFAR-10、CIFAR-100、ImageNet-200和ImageNet-1K等多份基准数据集上展现了优于现有方法的表现。它不仅提高了分类准确率,还显著提升了机器学习(ML)安全的关键指标,包括对自然图像损坏的耐受力、对抗攻击的鲁棒性、模型校准的改进以及预测一致性提升。 LayerMix代表着朝着开发更加可靠和适应性强的人工智能系统迈进的重要一步,解决了深度学习泛化的根本挑战。代码可以在提供的URL处获取。
https://arxiv.org/abs/2501.04861
HotFlip is a topical gradient-based word substitution method for attacking language models. Recently, this method has been further applied to attack retrieval systems by generating malicious passages that are injected into a corpus, i.e., corpus poisoning. However, HotFlip is known to be computationally inefficient, with the majority of time being spent on gradient accumulation for each query-passage pair during the adversarial token generation phase, making it impossible to generate an adequate number of adversarial passages in a reasonable amount of time. Moreover, the attack method itself assumes access to a set of user queries, a strong assumption that does not correspond to how real-world adversarial attacks are usually performed. In this paper, we first significantly boost the efficiency of HotFlip, reducing the adversarial generation process from 4 hours per document to only 15 minutes, using the same hardware. We further contribute experiments and analysis on two additional tasks: (1) transfer-based black-box attacks, and (2) query-agnostic attacks. Whenever possible, we provide comparisons between the original method and our improved version. Our experiments demonstrate that HotFlip can effectively attack a variety of dense retrievers, with an observed trend that its attack performance diminishes against more advanced and recent methods. Interestingly, we observe that while HotFlip performs poorly in a black-box setting, indicating limited capacity for generalization, in query-agnostic scenarios its performance is correlated to the volume of injected adversarial passages.
HotFlip 是一种基于主题梯度的词替换方法,用于攻击语言模型。最近,这种方法被进一步应用于通过生成恶意片段来攻击检索系统,这些片段会被注入语料库中,即所谓的“语料中毒”。然而,HotFlip 被认为计算效率低下,在对抗性标记生成阶段,大多数时间都花费在每个查询-片段对的梯度积累上,这使得在合理的时间内生成足够的对抗性片段变得不可能。此外,该攻击方法本身假设可以访问一组用户查询,这是一个较强的前提条件,与现实世界中的实际对抗性攻击方式不符。 在这篇论文中,我们首先显著提高了 HotFlip 的效率,在使用相同硬件的情况下,将每份文档的对抗生成过程从 4 小时减少到仅需 15 分钟。此外,我们在两个额外任务上进行了实验和分析:(1)基于迁移的黑盒攻击;(2)无查询攻击。在可能的情况下,我们还提供了原始方法与改进版本之间的比较。我们的实验表明,HotFlip 可以有效攻击多种密集检索器,并且观察到其对抗性能随着更为先进和近期的方法而减弱的趋势。有趣的是,尽管 HotFlip 在黑盒设置下表现不佳,显示出泛化能力有限,但在无查询场景中,其性能与注入的对抗性片段数量相关联。
https://arxiv.org/abs/2501.04802
Adversarial training has proven to be a highly effective method for improving the robustness of deep neural networks against adversarial attacks. Nonetheless, it has been observed to exhibit a limitation in terms of robust fairness, characterized by a significant disparity in robustness across different classes. Recent efforts to mitigate this problem have turned to class-wise reweighted methods. However, these methods suffer from a lack of rigorous theoretical analysis and are limited in their exploration of the weight space, as they mainly rely on existing heuristic algorithms or intuition to compute weights. In addition, these methods fail to guarantee the consistency of the optimization direction due to the decoupled optimization of weights and the model parameters. They potentially lead to suboptimal weight assignments and consequently, a suboptimal model. To address these problems, this paper proposes a novel min-max training framework, Class Optimal Distribution Adversarial Training (CODAT), which employs distributionally robust optimization to fully explore the class-wise weight space, thus enabling the identification of the optimal weight with theoretical guarantees. Furthermore, we derive a closed-form optimal solution to the internal maximization and then get a deterministic equivalent objective function, which provides a theoretical basis for the joint optimization of weights and model parameters. Meanwhile, we propose a fairness elasticity coefficient for the evaluation of the algorithm with regard to both robustness and robust fairness. Experimental results on various datasets show that the proposed method can effectively improve the robust fairness of the model and outperform the state-of-the-art approaches.
对抗训练已被证明是提高深度神经网络在面对对抗性攻击时鲁棒性的非常有效的方法。然而,它被观察到存在一个关于鲁棒公平性的局限性,即不同类别的鲁棒性之间存在着显著的差异。为了缓解这个问题,最近的研究转向了基于类别重新加权的方法。但是,这些方法缺乏严格的理论分析,并且在探索权重空间方面受到限制,因为它们主要依赖于现有的启发式算法或直觉来计算权重。此外,由于权重和模型参数的解耦优化,这些方法无法保证优化方向的一致性,可能导致次优的权重分配及由此产生的次优模型。 为了解决这些问题,本文提出了一种新的极小-极大训练框架——类别最优分布对抗训练(Class Optimal Distribution Adversarial Training, CODAT)。该框架采用分布鲁棒优化来全面探索基于类别的权重空间,从而能够理论上有保证地识别出最优权重。此外,我们还推导出了内部最大化的闭式解,并得到了一个确定性的等价目标函数,这为同时优化权重和模型参数提供了理论基础。 与此同时,本文提出了一种公平性弹性系数来评估算法在鲁棒性和鲁棒公平性方面的表现。实验结果表明,在多个数据集上的测试中,所提方法可以有效提高模型的鲁棒公平性,并超越了现有的最佳方法。
https://arxiv.org/abs/2501.04527
Style voice conversion aims to transform the speaking style of source speech into a desired style while keeping the original speaker's identity. However, previous style voice conversion approaches primarily focus on well-defined domains such as emotional aspects, limiting their practical applications. In this study, we present ZSVC, a novel Zero-shot Style Voice Conversion approach that utilizes a speech codec and a latent diffusion model with speech prompting mechanism to facilitate in-context learning for speaking style conversion. To disentangle speaking style and speaker timbre, we introduce information bottleneck to filter speaking style in the source speech and employ Uncertainty Modeling Adaptive Instance Normalization (UMAdaIN) to perturb the speaker timbre in the style prompt. Moreover, we propose a novel adversarial training strategy to enhance in-context learning and improve style similarity. Experiments conducted on 44,000 hours of speech data demonstrate the superior performance of ZSVC in generating speech with diverse speaking styles in zero-shot scenarios.
风格语音转换旨在将源语音的说话方式转换为所需的风格,同时保持原始发言人的身份不变。然而,以往的风格语音转换方法主要集中在情感等定义明确的领域上,限制了其实用性。在这项研究中,我们提出了一种新的零样本风格语音转换(ZSVC)方法,该方法利用语音编解码器和带有语音提示机制的潜在扩散模型来促进上下文学习中的说话方式转换。为了区分说话风格与发音特点,我们在源语音中引入信息瓶颈以筛选出说话风格,并使用不确定性建模自适应实例归一化(UMAdaIN)扰动风格提示中的发音特点。此外,我们提出了一种新的对抗训练策略,以增强上下文学习并提高风格相似度。在4.4万小时的语音数据上进行的实验表明,ZSVC在零样本场景中生成具有多样说话方式的语音方面表现出优越性能。
https://arxiv.org/abs/2501.04416
As machine learning (ML) systems increasingly impact critical sectors such as hiring, financial risk assessments, and criminal justice, the imperative to ensure fairness has intensified due to potential negative implications. While much ML fairness research has focused on enhancing training data and processes, addressing the outputs of already deployed systems has received less attention. This paper introduces 'BiasGuard', a novel approach designed to act as a fairness guardrail in production ML systems. BiasGuard leverages Test-Time Augmentation (TTA) powered by Conditional Generative Adversarial Network (CTGAN), a cutting-edge generative AI model, to synthesize data samples conditioned on inverted protected attribute values, thereby promoting equitable outcomes across diverse groups. This method aims to provide equal opportunities for both privileged and unprivileged groups while significantly enhancing the fairness metrics of deployed systems without the need for retraining. Our comprehensive experimental analysis across diverse datasets reveals that BiasGuard enhances fairness by 31% while only reducing accuracy by 0.09% compared to non-mitigated benchmarks. Additionally, BiasGuard outperforms existing post-processing methods in improving fairness, positioning it as an effective tool to safeguard against biases when retraining the model is impractical.
随着机器学习(ML)系统在招聘、金融风险评估和刑事司法等关键领域的影响日益增大,确保公平性的紧迫性也因潜在的负面影响而增强。尽管许多关于机器学习公平性的研究集中在改善训练数据和流程上,但对已经部署系统的输出进行调整的关注较少。本文介绍了“BiasGuard”,这是一种旨在作为生产ML系统中公平性护栏的新方法。BiasGuard利用测试时间增强(TTA)技术,该技术由条件生成对抗网络(CTGAN)这一前沿的生成式AI模型提供支持,能够合成基于反转保护属性值的数据样本,从而在不同群体间促进公正的结果。这种方法旨在为特权和非特权群体提供平等的机会,并且无需重新训练就能显著提升部署系统的公平性指标。我们对多样化的数据集进行了全面的实验分析,结果表明BiasGuard能将公平性提高31%,而与未缓解基准相比,仅降低了0.09%的准确性。此外,BiasGuard在改进公平性方面超越了现有的后处理方法,这使其成为一种有效工具,在重新训练模型不切实际的情况下防范偏见。
https://arxiv.org/abs/2501.04142
Recent advancements in generative AI have made it possible to create synthetic datasets that can be as accurate as real-world data for training AI models, powering statistical insights, and fostering collaboration with sensitive datasets while offering strong privacy guarantees. Effectively measuring the empirical privacy of synthetic data is an important step in the process. However, while there is a multitude of new privacy metrics being published every day, there currently is no standardization. In this paper, we review the pros and cons of popular metrics that include simulations of adversarial attacks. We also review current best practices for amending generative models to enhance the privacy of the data they create (e.g. differential privacy).
最近在生成式人工智能领域的进展使得创建合成数据集成为可能,这些数据集的准确性可以与真实世界的数据相媲美,可用于训练AI模型、推动统计洞察以及在保护隐私的前提下促进敏感数据集的合作。有效测量合成数据的经验隐私是一项重要的步骤。然而,尽管每天都有许多新的隐私度量标准被提出,但目前还没有标准化的做法。在这篇论文中,我们回顾了包括对抗性攻击模拟在内的流行度量标准的优缺点,并且还回顾了当前用于改进生成模型以增强其创建的数据隐私性的最佳实践(例如差分隐私)。
https://arxiv.org/abs/2501.03941
The rapid advancement in large language models (LLMs) has significantly enhanced their ability to generate coherent and contextually relevant text, raising concerns about the misuse of AI-generated content and making it critical to detect it. However, the task remains challenging, particularly in unseen domains or with unfamiliar LLMs. Leveraging LLM next-token distribution outputs offers a theoretically appealing approach for detection, as they encapsulate insights from the models' extensive pre-training on diverse corpora. Despite its promise, zero-shot methods that attempt to operationalize these outputs have met with limited success. We hypothesize that one of the problems is that they use the mean to aggregate next-token distribution metrics across tokens, when some tokens are naturally easier or harder to predict and should be weighted differently. Based on this idea, we propose the Perplexity Attention Weighted Network (PAWN), which uses the last hidden states of the LLM and positions to weight the sum of a series of features based on metrics from the next-token distribution across the sequence length. Although not zero-shot, our method allows us to cache the last hidden states and next-token distribution metrics on disk, greatly reducing the training resource requirements. PAWN shows competitive and even better performance in-distribution than the strongest baselines (fine-tuned LMs) with a fraction of their trainable parameters. Our model also generalizes better to unseen domains and source models, with smaller variability in the decision boundary across distribution shifts. It is also more robust to adversarial attacks, and if the backbone has multilingual capabilities, it presents decent generalization to languages not seen during supervised training, with LLaMA3-1B reaching a mean macro-averaged F1 score of 81.46% in cross-validation with nine languages.
大型语言模型(LLMs)的迅速发展显著增强了其生成连贯且上下文相关的文本的能力,这引发了对AI生成内容被滥用的关注,并使得检测这种内容变得至关重要。然而,在未见过的领域或使用不熟悉的LLM时,这项任务仍然极具挑战性。利用LLM下一个令牌分布输出提供了一种理论上颇具吸引力的方法来实现这一目标,因为这些输出包含了模型在广泛多样语料库上预训练所获得的知识和洞察。尽管这种方法前景广阔,但尝试将其应用于实际操作的零样本方法(zero-shot methods)并未取得显著成功。我们假设其中一个问题是它们使用平均值来汇总下一个令牌分布指标,在这种情况下,有些令牌自然更容易或更难预测,应该被赋予不同的权重。基于这一想法,我们提出了困惑度注意加权网络(Perplexity Attention Weighted Network, PAWN),该模型利用LLM的最后隐藏状态和位置信息,根据序列长度上来自下一个令牌分布指标来对一系列特征进行加权求和。虽然我们的方法不是零样本的,但我们可以通过在磁盘上缓存最后的隐藏状态和下一个令牌分布指标大大减少训练资源需求。 PAWN在原分布内表现出了与最强基线(微调语言模型)相当甚至更好的性能,并且只需要这些基线一小部分可训练参数。此外,我们的模型更好地泛化到了未见过的领域和源模型,在分布变化时决策边界的波动更小。它还对对抗性攻击更加鲁棒。如果其骨干具有多语言能力,则该模型在监督训练期间未曾见过的语言上也表现出良好的泛化能力,LLaMA3-1B在九种语言上的交叉验证中的平均宏平均F1得分为81.46%。
https://arxiv.org/abs/2501.03940
Situation assessment in Real-Time Strategy (RTS) games is crucial for understanding decision-making in complex adversarial environments. However, existing methods remain limited in processing multi-dimensional feature information and temporal dependencies. Here we propose a tri-dimensional Space-Time-Feature Transformer (TSTF Transformer) architecture, which efficiently models battlefield situations through three independent but cascaded modules: spatial attention, temporal attention, and feature attention. On a dataset comprising 3,150 adversarial experiments, the 8-layer TSTF Transformer demonstrates superior performance: achieving 58.7% accuracy in the early game (~4% progress), significantly outperforming the conventional Timesformer's 41.8%; reaching 97.6% accuracy in the mid-game (~40% progress) while maintaining low performance variation (standard deviation 0.114). Meanwhile, this architecture requires fewer parameters (4.75M) compared to the baseline model (5.54M). Our study not only provides new insights into situation assessment in RTS games but also presents an innovative paradigm for Transformer-based multi-dimensional temporal modeling.
在实时策略(RTS)游戏中,情况评估对于理解复杂对抗环境中的决策制定至关重要。然而,现有的方法仍然受限于处理多维特征信息和时间依赖性方面。为此,我们提出了一种三维空间-时间-特征变换器(TSTF Transformer)架构,该架构通过三个独立但级联的模块:空间注意力、时间注意力和特征注意力,高效地建模战场情况。在包含3,150个对抗实验的数据集上,8层的TSTF Transformer展示了卓越的表现:在早期游戏阶段(~4%进度),其准确率达到了58.7%,显著优于传统Timesformer的41.8%;而在中期游戏阶段(~40%进度)时,其准确率达到97.6%,同时保持了较低的成绩波动(标准差为0.114)。此外,这种架构所需的参数更少(4.75M),相比之下基线模型需要5.54M的参数。我们的研究不仅为RTS游戏中情况评估提供了新的见解,还提出了一种基于Transformer的多维时间建模的新范式。 这个创新的TSTF Transformer通过结合空间、时间和特征三个维度的信息来改进现有的方法,从而更好地模拟复杂的RTS游戏环境中的动态变化,并有助于更准确地进行决策制定。
https://arxiv.org/abs/2501.03832
The emergence of virtual staining technology provides a rapid and efficient alternative for researchers in tissue pathology. It enables the utilization of unlabeled microscopic samples to generate virtual replicas of chemically stained histological slices, or facilitate the transformation of one staining type into another. The remarkable performance of generative networks, such as CycleGAN, offers an unsupervised learning approach for virtual coloring, overcoming the limitations of high-quality paired data required in supervised learning. Nevertheless, large-scale color transformation necessitates processing large field-of-view images in patches, often resulting in significant boundary inconsistency and artifacts. Additionally, the transformation between different colorized modalities typically needs further efforts to modify loss functions and tune hyperparameters for independent training of networks. In this study, we introduce a general virtual staining framework that is adaptable to various conditions. We propose a loss function based on the value mapping constraint to ensure the accuracy of virtual coloring between different pathological modalities, termed the Value Mapping Generative Adversarial Network (VM-GAN). Meanwhile, we present a confidence-based tiling method to address the challenge of boundary inconsistency arising from patch-wise processing. Experimental results on diverse data with varying staining protocols demonstrate that our method achieves superior quantitative indicators and improved visual perception.
虚拟染色技术的出现为组织病理学研究人员提供了一种快速且高效的替代方案。这项技术允许利用未标记的显微镜样本生成化学染色后的组织切片的虚拟副本,或促进一种染色类型向另一种类型的转换。生成网络(如CycleGAN)的卓越性能提供了无监督学习方法来进行虚拟着色,克服了监督学习所需的高质量配对数据限制。然而,大规模的颜色变换需要处理大片视野的图像片段,这往往会导致显著的边界不一致和伪影问题。此外,在不同颜色化模态之间进行转换通常还需要进一步的努力来修改损失函数并调整超参数以独立训练网络。 在本研究中,我们引入了一个适用于各种条件的一般虚拟染色框架。我们提出了一种基于值映射约束的损失函数,用以确保不同病理学模式之间的虚拟着色准确性,并将其命名为值映射生成对抗网络(VM-GAN)。同时,我们还提供了一种基于置信度的拼接方法来解决由分块处理引起的边界不一致问题。在采用不同染色协议的各种数据集上进行的实验结果表明,我们的方法实现了更优的定量指标和改进的视觉感知效果。
https://arxiv.org/abs/2501.03592
Deep Reinforcement Learning (DRL) suffers from uncertainties and inaccuracies in the observation signal in realworld applications. Adversarial attack is an effective method for evaluating the robustness of DRL agents. However, existing attack methods targeting individual sampled actions have limited impacts on the overall policy distribution, particularly in continuous action spaces. To address these limitations, we propose the Distribution-Aware Projected Gradient Descent attack (DAPGD). DAPGD uses distribution similarity as the gradient perturbation input to attack the policy network, which leverages the entire policy distribution rather than relying on individual samples. We utilize the Bhattacharyya distance in DAPGD to measure policy similarity, enabling sensitive detection of subtle but critical differences between probability distributions. Our experiment results demonstrate that DAPGD achieves SOTA results compared to the baselines in three robot navigation tasks, achieving an average 22.03% higher reward drop compared to the best baseline.
深度强化学习(DRL)在实际应用中会遇到观察信号的不确定性和不准确性问题。对抗攻击是评估DRL代理鲁棒性的一种有效方法。然而,现有针对单一样本动作的攻击方法对整体策略分布的影响有限,特别是在连续动作空间中。为了解决这些问题,我们提出了基于分布感知投影梯度下降(Distribution-Aware Projected Gradient Descent, DAPGD)的攻击方法。DAPGD利用了分布相似性作为梯度扰动输入来攻击策略网络,这种方法依赖于整个策略分布而非单独样本。 在DAPGD中,我们使用Bhattacharyya距离来衡量策略之间的相似性,这使得能够敏感地检测到概率分布之间细微但关键的差异。实验结果显示,在三个机器人导航任务上,与基线方法相比,DAPGD达到了最先进的结果,平均奖励下降比最佳基线高出22.03%。
https://arxiv.org/abs/2501.03562
Self-supervised learning (SSL) has significantly advanced image representation learning, yet efficiency challenges persist, particularly with adversarial training. Many SSL methods require extensive epochs to achieve convergence, a demand further amplified in adversarial settings. To address this inefficiency, we revisit the robust EMP-SSL framework, emphasizing the importance of increasing the number of crops per image to accelerate learning. Unlike traditional contrastive learning, robust EMP-SSL leverages multi-crop sampling, integrates an invariance term and regularization, and reduces training epochs, enhancing time efficiency. Evaluated with both standard linear classifiers and multi-patch embedding aggregation, robust EMP-SSL provides new insights into SSL evaluation strategies. Our results show that robust crop-based EMP-SSL not only accelerates convergence but also achieves a superior balance between clean accuracy and adversarial robustness, outperforming multi-crop embedding aggregation. Additionally, we extend this approach with free adversarial training in Multi-Crop SSL, introducing the Cost-Free Adversarial Multi-Crop Self-Supervised Learning (CF-AMC-SSL) method. CF-AMC-SSL demonstrates the effectiveness of free adversarial training in reducing training time while simultaneously improving clean accuracy and adversarial robustness. These findings underscore the potential of CF-AMC-SSL for practical SSL applications. Our code is publicly available at this https URL.
自监督学习(Self-supervised Learning,SSL)在图像表示学习方面取得了显著进展,但仍然存在效率挑战,尤其是在对抗训练中。许多SSL方法需要大量的迭代周期才能达到收敛状态,而在对抗环境中这一需求进一步增加。为了应对这种低效问题,我们重新审视了鲁棒的EMP-SSL框架,并强调通过增加每张图片的采样数量来加速学习过程的重要性。 与传统的对比学习不同,鲁棒的EMP-SSL利用多作物抽样、整合不变性项和正则化,并减少了训练周期,从而提高了时间效率。该方法在标准线性分类器和多补丁嵌入聚合评估中提供了关于SSL评价策略的新见解。我们的实验结果显示,基于农作物的鲁棒EMP-SSL不仅加速了收敛过程,还实现了干净准确性和对抗健壮性的最佳平衡,优于多作物嵌入聚合。 此外,我们还在多作物SSL中引入了一种自由对抗训练的方法——成本免费对抗多作物自监督学习(Cost-Free Adversarial Multi-Crop Self-Supervised Learning, CF-AMC-SSL)。CF-AMC-SSL证明了自由对抗训练在减少训练时间的同时能够提升干净准确性和对抗健壮性。这些发现强调了CF-AMC-SSL在实际SSL应用中的潜力。 我们的代码可以在以下URL公开获取:[此链接](请将“this https URL”替换为实际的URL)。
https://arxiv.org/abs/2501.03507
AI humanizers are a new class of online software tools meant to paraphrase and rewrite AI-generated text in a way that allows them to evade AI detection software. We study 19 AI humanizer and paraphrasing tools and qualitatively assess their effects and faithfulness in preserving the meaning of the original text. We show that many existing AI detectors fail to detect humanized text. Finally, we demonstrate a robust model that can detect humanized AI text while maintaining a low false positive rate using a data-centric augmentation approach. We attack our own detector, training our own fine-tuned model optimized against our detector's predictions, and show that our detector's cross-humanizer generalization is sufficient to remain robust to this attack.
AI人性化工具是一类新型的在线软件,旨在重新表述和改写由AI生成的文字,以便让它们能够规避现有的AI检测软件。我们研究了19种AI人性化和重述工具,并从定性角度评估了这些工具的效果及其在保留原文意思方面的忠实度。结果显示,许多现有的AI检测器无法识别经过人性化的文本。最后,我们展示了一种使用数据为中心的增强方法构建的稳健模型,该模型能够准确地检测出经过人性化处理的人工智能生成的文字,并且保持较低的误报率。我们还通过训练自己的微调模型来优化对抗我们的检测器预测的方法攻击了自己开发的检测器,并展示了我们的检测器在跨不同AI人性化工具之间的泛化能力足够强大,可以抵御这种攻击并维持其稳健性。
https://arxiv.org/abs/2501.03437
Advancements in robotics have opened possibilities to automate tasks in various fields such as manufacturing, emergency response and healthcare. However, a significant challenge that prevents robots from operating in real-world environments effectively is out-of-distribution (OOD) situations, wherein robots encounter unforseen situations. One major OOD situations is when robots encounter faults, making fault adaptation essential for real-world operation for robots. Current state-of-the-art reinforcement learning algorithms show promising results but suffer from sample inefficiency, leading to low adaptation speed due to their limited ability to generalize to OOD situations. Our research is a step towards adding hardware fault tolerance and fast fault adaptability to machines. In this research, our primary focus is to investigate the efficacy of generative flow networks in robotic environments, particularly in the domain of machine fault adaptation. We simulated a robotic environment called Reacher in our experiments. We modify this environment to introduce four distinct fault environments that replicate real-world machines/robot malfunctions. The empirical evaluation of this research indicates that continuous generative flow networks (CFlowNets) indeed have the capability to add adaptive behaviors in machines under adversarial conditions. Furthermore, the comparative analysis of CFlowNets with reinforcement learning algorithms also provides some key insights into the performance in terms of adaptation speed and sample efficiency. Additionally, a separate study investigates the implications of transferring knowledge from pre-fault task to post-fault environments. Our experiments confirm that CFlowNets has the potential to be deployed in a real-world machine and it can demonstrate adaptability in case of malfunctions to maintain functionality.
机器人技术的进步已经为制造、应急响应和医疗保健等领域的任务自动化开启了新的可能。然而,机器人在真实世界环境中有效操作的一个重大挑战是所谓的“分布外”(OOD)情况,在这种情况下,机器人会遇到未预料的情况。其中一种主要的OOD情况就是机器人故障的发生,这使得故障适应性对于机器人的实际应用至关重要。 目前最先进的强化学习算法已经显示出了有前景的结果,但在样本效率方面存在不足,导致其在面对OOD情形时的适应速度较慢,因为它们缺乏良好的泛化能力。我们的研究致力于为机械加入硬件容错和快速故障适应性。在这项研究中,我们主要关注的是生成流网络(Generative Flow Networks)在机器人环境中的有效性,特别是在机器故障适应领域。 我们在实验中使用了一个称为Reacher的模拟机器人环境,并对其进行了修改以引入四个不同的故障环境来模仿现实世界中的机械或机器人故障情况。我们的实证评估表明,连续生成流网络(Continuous Generative Flow Networks, CFlowNets)确实具有在对抗条件下为机器添加适应行为的能力。 此外,我们将CFlowNets与强化学习算法进行比较分析,还提供了一些有关适应速度和样本效率的性能关键见解。另一项独立的研究探讨了将预故障任务知识转移到后故障环境中的影响。我们的实验结果证实,CFlowNets有可能在实际机器中部署,并且能够在发生故障时展示出适应性以保持功能。 总的来说,这项研究表明了CFlowNets在网络无法获取完整信息的情况下进行快速学习和适应的能力,在提高机器人在现实世界环境中应对未知情况的灵活性方面具有巨大潜力。
https://arxiv.org/abs/2501.03405
Scenario-based training has been widely adopted in many public service sectors. Recent advancements in Large Language Models (LLMs) have shown promise in simulating diverse personas to create these training scenarios. However, little is known about how LLMs can be developed to simulate victims for scenario-based training purposes. In this paper, we introduce VicSim (victim simulator), a novel model that addresses three key dimensions of user simulation: informational faithfulness, emotional dynamics, and language style (e.g., grammar usage). We pioneer the integration of scenario-based victim modeling with GAN-based training workflow and key-information-based prompting, aiming to enhance the realism of simulated victims. Our adversarial training approach teaches the discriminator to recognize grammar and emotional cues as reliable indicators of synthetic content. According to evaluations by human raters, the VicSim model outperforms GPT-4 in terms of human-likeness.
情景式培训在许多公共服务领域被广泛采用。大型语言模型(LLM)的最新进展显示,它们能够模拟多样化的角色以创建这些培训场景,展现出了巨大潜力。然而,关于如何开发大型语言模型来为情境训练模拟受害者尚知之甚少。在此论文中,我们介绍了VicSim(受害人模拟器),这是一种新型模型,它在用户模拟的三个关键维度上进行了创新:信息真实性、情感动态和语言风格(例如语法使用)。我们开创了情景式受害人建模与基于生成对抗网络(GAN)训练工作流及基于关键信息提示的结合,旨在增强模拟受害者的真实感。我们的对抗性培训方法教导判别器识别语法和情绪线索作为合成内容可靠的指标。根据人类评估者的评价,VicSim模型在逼真度方面超越了GPT-4。
https://arxiv.org/abs/2501.03139
Recent successes in self-supervised learning (SSL) model spatial co-occurrences of visual features either by masking portions of an image or by aggressively cropping it. Here, we propose a new way to model spatial co-occurrences by aligning local representations (before pooling) with a global image representation. We present CO-SSL, a family of instance discrimination methods and show that it outperforms previous methods on several datasets, including ImageNet-1K where it achieves 71.5% of Top-1 accuracy with 100 pre-training epochs. CO-SSL is also more robust to noise corruption, internal corruption, small adversarial attacks, and large training crop sizes. Our analysis further indicates that CO-SSL learns highly redundant local representations, which offers an explanation for its robustness. Overall, our work suggests that aligning local and global representations may be a powerful principle of unsupervised category learning.
最近在自监督学习(SSL)模型中的成功案例是通过屏蔽图像的部分或激进地裁剪图像来建模视觉特征的空间共现。在这里,我们提出了一种新的方法,通过将局部表示(在池化之前)与全局图像表示对齐来建模空间共现。我们介绍了CO-SSL,这是一个实例区分方法家族,并展示了它在多个数据集上超越了先前的方法,包括ImageNet-1K,在该数据集中使用100个预训练周期达到了71.5%的Top-1准确率。CO-SSL还对噪声污染、内部污染、小规模对抗性攻击以及大规模训练裁剪尺寸更加鲁棒。我们的分析进一步表明,CO-SSL学习到了高度冗余的局部表示,这为其鲁棒性提供了解释。总体而言,我们的工作表明将局部和全局表示对齐可能是无监督类别学习的一个强大原则。
https://arxiv.org/abs/2501.02860
In recent years, attention-based models have excelled across various domains but remain vulnerable to backdoor attacks, often from downloading or fine-tuning on poisoned datasets. Many current methods to mitigate backdoors in NLP models rely on the pre-trained (unfine-tuned) weights, but these methods fail in scenarios where the pre-trained weights are not available. In this work, we propose MBTSAD, which can mitigate backdoors in the language model by utilizing only a small subset of clean data and does not require pre-trained weights. Specifically, MBTSAD retrains the backdoored model on a dataset generated by token splitting. Then MBTSAD leverages attention distillation, the retrained model is the teacher model, and the original backdoored model is the student model. Experimental results demonstrate that MBTSAD achieves comparable backdoor mitigation performance as the methods based on pre-trained weights while maintaining the performance on clean data. MBTSAD does not rely on pre-trained weights, enhancing its utility in scenarios where pre-trained weights are inaccessible. In addition, we simplify the min-max problem of adversarial training and visualize text representations to discover that the token splitting method in MBTSAD's first step generates Out-of-Distribution (OOD) data, leading the model to learn more generalized features and eliminate backdoor patterns.
近年来,基于注意力的模型在各个领域表现出色,但在面对后门攻击时仍显得脆弱,这些攻击通常源自下载或微调被污染的数据集。许多现有的缓解自然语言处理(NLP)模型中后门的方法依赖于预训练(未经微调)的权重,但当预训练权重不可用时,这些方法便无法发挥作用。在本工作中,我们提出了一种名为MBTSAD的方法,该方法能够仅通过使用少量干净的数据来消除语言模型中的后门,并且不需要预训练的权重。具体来说,MBTSAD会将受到攻击的模型重新训练在一个由标记拆分生成的数据集上。接着,MBTSAD利用注意力蒸馏技术,其中重新训练后的模型作为教师模型,而原始被植入后门的模型作为学生模型。 实验结果显示,MBTSAD在不依赖预训练权重的情况下,取得了与基于预训练权重的方法相当的缓解后门攻击性能,并且保持了对干净数据的良好处理能力。由于无需使用预训练权重,这使得MBTSAD在那些无法获得预训练权重的情境下变得更加有用。此外,我们简化了对抗训练中的最小化-最大化问题,并通过可视化文本表示发现,在MBTSAD的第一步中使用的标记拆分方法生成了分布外(OOD)数据,促使模型学习到更为泛化的特征并消除后门模式。
https://arxiv.org/abs/2501.02754