Using stochastic gradient approach we study the properties of adversarial perturbations resulting in noticeable growth of VMAF image quality metric. The structure of the perturbations is investigated depending on the acceptable PSNR values and based on the Fourier power spectrum computations for the perturbations. It is demonstrated that moderate variation of image brightness ($\sim 10$ pixel units in a restricted region of an image can result in VMAF growth by $\sim 60\%$). Unlike some other methods demonstrating similar VMAF growth, the subjective quality of an image remains almost unchanged. It is also shown that the adversarial perturbations may demonstrate approximately linear dependence of perturbation amplitudes on the image brightness. The perturbations are studied based on the direct VMAF optimization in PyTorch. The significant discrepancies between the metric values and subjective judgements are also demonstrated when image restoration from noise is carried out using the same direct VMAF optimization.
使用随机梯度方法,我们研究了导致VMAF图像质量指标明显增长的对抗性扰动特性。根据可接受的PSNR值并通过计算扰动的傅里叶功率谱来探究这些扰动的结构特征。研究表明,中等程度的图像亮度变化(在一个受限区域内的像素单位约10个)可能导致VMAF指标增加大约60%。与一些展示类似VMAF增长的方法不同的是,该方法下图像的主观质量几乎没有改变。 此外还展示了对抗性扰动可能表现出近似于线性的依赖关系,即扰动幅度随图像亮度的变化而变化。这些扰动是在PyTorch中直接进行VMAF优化的基础上研究得出的。当使用相同的方法(直接的VMAF优化)来进行从噪声恢复图像的操作时,还展示了该度量值与主观评判之间的显著差异。 总的来说,这项工作揭示了在基于VMAF的直接优化过程中存在的一些问题和挑战,并且强调了依赖单一质量指标进行图像处理可能存在的局限性。
https://arxiv.org/abs/2503.14111
Medical ultrasound imaging is ubiquitous, but manual analysis struggles to keep pace. Automated segmentation can help but requires large labeled datasets, which are scarce. Semi-supervised learning leveraging both unlabeled and limited labeled data is a promising approach. State-of-the-art methods use consistency regularization or pseudo-labeling but grow increasingly complex. Without sufficient labels, these models often latch onto artifacts or allow anatomically implausible segmentations. In this paper, we present a simple yet effective pseudo-labeling method with an adversarially learned shape prior to regularize segmentations. Specifically, we devise an encoder-twin-decoder network where the shape prior acts as an implicit shape model, penalizing anatomically implausible but not ground-truth-deviating predictions. Without bells and whistles, our simple approach achieves state-of-the-art performance on two benchmarks under different partition protocols. We provide a strong baseline for future semi-supervised medical image segmentation. Code is available at this https URL.
医学超声成像无处不在,但手动分析难以跟上步伐。自动分割技术可以帮助解决这个问题,但它需要大量的标注数据集,而这些数据往往是稀缺的。半监督学习利用未标记和有限标记的数据,是一种有前景的方法。目前最先进的方法使用一致性正则化或伪标签,但随着问题复杂性的增加,这些方法也变得越来越复杂。在缺乏足够标签的情况下,这些模型往往会依赖于图像中的噪声特征或将允许解剖学上不可能的分割结果。 本文提出了一种简单而有效的伪标签生成方法,并通过对抗性学习得到一个形状先验来正则化分割过程。具体来说,我们设计了一个编码器-孪生-解码器网络,在此框架下,形状先验作为隐式的形状模型,能够惩罚那些虽然不是与真实标注偏差太大但从解剖学角度看不可能的预测结果。 在不添加任何额外复杂性的条件下,我们的简单方法在两个不同分割协议下的基准测试中达到了最先进的性能。我们为未来的半监督医学图像分割提供了一个强有力的基线。代码可在该网址获取:[此链接应根据实际发布情况填写]。
https://arxiv.org/abs/2503.13987
Multimodal Large Language Models (MLLMs) have demonstrated exceptional performance in artificial intelligence by facilitating integrated understanding across diverse modalities, including text, images, video, audio, and speech. However, their deployment in real-world applications raises significant concerns about adversarial vulnerabilities that could compromise their safety and reliability. Unlike unimodal models, MLLMs face unique challenges due to the interdependencies among modalities, making them susceptible to modality-specific threats and cross-modal adversarial manipulations. This paper reviews the adversarial robustness of MLLMs, covering different modalities. We begin with an overview of MLLMs and a taxonomy of adversarial attacks tailored to each modality. Next, we review key datasets and evaluation metrics used to assess the robustness of MLLMs. After that, we provide an in-depth review of attacks targeting MLLMs across different modalities. Our survey also identifies critical challenges and suggests promising future research directions.
多模态大型语言模型(MLLMs)通过促进跨文本、图像、视频、音频和语音等不同模式的综合理解,在人工智能领域展示了卓越的表现。然而,将其部署于实际应用中引发了关于对抗性脆弱性的重大担忧,这些脆弱性可能危及其安全性和可靠性。与单模态模型相比,多模态模型由于各模式之间的相互依赖而面临独特的挑战,使其容易受到特定模式威胁和跨模式的对抗性操控。本文回顾了MLLMs在不同模态下的对抗鲁棒性问题,涵盖了不同的模态。 首先,我们将对MLLMs进行概述,并介绍针对每种模态定制的对抗攻击分类法。接下来,我们回顾用于评估MLLMs稳健性的关键数据集和评价指标。然后,我们将深入探讨跨不同模态针对MLLMs的攻击情况。我们的综述还指出了重要的挑战,并提出了有前景的未来研究方向。 通过本文的研究,希望能够增进对多模态模型潜在安全威胁的理解,为开发更鲁棒、更可靠的人工智能系统提供指导和建议。
https://arxiv.org/abs/2503.13962
The fine-tuning technique for text-to-image diffusion models facilitates image customization but risks privacy breaches and opinion manipulation. Current research focuses on prompt- or image-level adversarial attacks for anti-customization, yet it overlooks the correlation between these two levels and the relationship between internal modules and inputs. This hinders anti-customization performance in practical threat scenarios. We propose Dual Anti-Diffusion (DADiff), a two-stage adversarial attack targeting diffusion customization, which, for the first time, integrates the adversarial prompt-level attack into the generation process of image-level adversarial examples. In stage 1, we generate prompt-level adversarial vectors to guide the subsequent image-level attack. In stage 2, besides conducting the end-to-end attack on the UNet model, we disrupt its self- and cross-attention modules, aiming to break the correlations between image pixels and align the cross-attention results computed using instance prompts and adversarial prompt vectors within the images. Furthermore, we introduce a local random timestep gradient ensemble strategy, which updates adversarial perturbations by integrating random gradients from multiple segmented timesets. Experimental results on various mainstream facial datasets demonstrate 10%-30% improvements in cross-prompt, keyword mismatch, cross-model, and cross-mechanism anti-customization with DADiff compared to existing methods.
文本到图像的扩散模型微调技术促进了图像定制,但同时也带来了隐私泄露和观点操控的风险。当前的研究主要集中在针对提示级别或图像级别的对抗性攻击以防止定制化,却忽略了这两者之间的关联以及内部模块与输入数据的关系,这在实际威胁场景中限制了防定制化的性能表现。为此,我们提出了一种名为“双重反扩散”(Dual Anti-Diffusion, DADiff)的两阶段对抗性攻击方法,专门用于针对扩散模型定制化过程中的安全问题。DADiff首次将提示级别的对抗性攻击整合到了图像级别对抗样本生成过程中。 在第一阶段,我们将生成提示级别的对抗向量以指导后续的图像级攻击。而在第二阶段,除了在整个UNet模型上进行端到端的攻击之外,我们还试图破坏其自注意力和交叉注意力模块,旨在打破图像像素之间的关联性,并且将使用实例提示计算得出的交叉注意力结果与对抗性的提示向量所对应的结果对齐。 此外,我们引入了一种局部随机时间步梯度整合策略,通过集成多个分段时间段中的随机梯度来更新对抗扰动。实验结果显示,在各种主流面部数据集上,DADiff方法相较于现有技术,在跨提示、关键词不匹配、跨模型和跨机制的防定制化性能方面平均提升了10%-30%。
https://arxiv.org/abs/2503.13945
Unsupervised anomaly detection using deep learning has garnered significant research attention due to its broad applicability, particularly in medical imaging where labeled anomalous data are scarce. While earlier approaches leverage generative models like autoencoders and generative adversarial networks (GANs), they often fall short due to overgeneralization. Recent methods explore various strategies, including memory banks, normalizing flows, self-supervised learning, and knowledge distillation, to enhance discrimination. Among these, knowledge distillation, particularly reverse distillation, has shown promise. Following this paradigm, we propose a novel scale-aware contrastive reverse distillation model that addresses two key limitations of existing reverse distillation methods: insufficient feature discriminability and inability to handle anomaly scale variations. Specifically, we introduce a contrastive student-teacher learning approach to derive more discriminative representations by generating and exploring out-of-normal distributions. Further, we design a scale adaptation mechanism to softly weight contrastive distillation losses at different scales to account for the scale variation issue. Extensive experiments on benchmark datasets demonstrate state-of-the-art performance, validating the efficacy of the proposed method. Code is available at this https URL.
无监督深度学习异常检测方法由于其广泛的应用性,特别是在医学影像领域(该领域的异常数据标注极为稀缺)中,受到了大量研究关注。早期的方法大多利用生成模型如自动编码器和生成对抗网络(GANs),但这些方法常因过度泛化而效果不佳。近期的研究则探索了多种策略,包括内存银行、归一化流、自监督学习及知识蒸馏等,以提升异常检测的区分度。其中,尤其是逆向蒸馏展现出了潜在优势。 基于这一范式,我们提出了一种新颖的尺度感知对比逆向蒸馏模型,旨在解决现有逆向蒸馏方法中存在的两个主要问题:特征辨别能力不足以及无法处理异常尺寸变化的问题。具体而言,我们引入了对比性师生学习机制,通过生成并探索非正常分布来推导更具区分度的表示形式。此外,还设计了一种尺度适应机制,在不同尺度下对对比损失进行软权重调整,以解决尺度变异问题。 在基准数据集上的广泛实验表明,我们的方法达到了最先进的性能水平,证明了所提方案的有效性。相关代码可在[此处](https://example.com)获取(实际链接需替换为真实URL)。
https://arxiv.org/abs/2503.13828
Single Domain Generalization (SDG) aims to train models with consistent performance across diverse scenarios using data from a single source. While using latent diffusion models (LDMs) show promise in augmenting limited source data, we demonstrate that directly using synthetic data can be detrimental due to significant feature distribution discrepancies between synthetic and real target domains, leading to performance degradation. To address this issue, we propose Discriminative Domain Reassembly and Soft-Fusion (DRSF), a training framework leveraging synthetic data to improve model generalization. We employ LDMs to produce diverse pseudo-target domain samples and introduce two key modules to handle distribution bias. First, Discriminative Feature Decoupling and Reassembly (DFDR) module uses entropy-guided attention to recalibrate channel-level features, suppressing synthetic noise while preserving semantic consistency. Second, Multi-pseudo-domain Soft Fusion (MDSF) module uses adversarial training with latent-space feature interpolation, creating continuous feature transitions between domains. Extensive SDG experiments on object detection and semantic segmentation tasks demonstrate that DRSF achieves substantial performance gains with only marginal computational overhead. Notably, DRSF's plug-and-play architecture enables seamless integration with unsupervised domain adaptation paradigms, underscoring its broad applicability in addressing diverse and real-world domain challenges.
单领域泛化(SDG)的目标是使用单一来源的数据训练模型,使其在各种场景中都能保持一致的性能。虽然利用潜在扩散模型(LDMs)来扩充有限的源数据显示出潜力,但我们证明直接使用合成数据可能会因为合成和真实目标领域的特征分布差异较大而损害模型性能。为了解决这个问题,我们提出了判别领域重组与软融合(DRSF),这是一种训练框架,通过利用合成数据来提高模型泛化能力。 我们的方法采用LDMs生成多样化的伪目标领域样本,并引入了两个关键模块以处理分布偏差问题:第一是判别特征解耦和重组(DFDR)模块,该模块使用熵引导的注意力机制在通道级别上重新校准特性,抑制合成噪声的同时保留语义一致性;第二是多伪域软融合(MDSF)模块,它通过潜在空间特征插值进行对抗性训练,在领域之间创建连续的特征转换。 针对目标检测和语义分割任务进行了广泛的SDG实验,结果显示DRSF能够在仅增加少量计算开销的情况下实现显著性能提升。值得一提的是,由于DRSF采用即插即用架构,它可以无缝集成到无监督域适应范式中,这凸显了其在应对各种真实世界的领域挑战中的广泛应用前景。
https://arxiv.org/abs/2503.13617
The synergy between virtual reality (VR) and artificial intelligence (AI), specifically deep learning (DL)-based cybersickness detection models, has ushered in unprecedented advancements in immersive experiences by automatically detecting cybersickness severity and adaptively various mitigation techniques, offering a smooth and comfortable VR experience. While this DL-enabled cybersickness detection method provides promising solutions for enhancing user experiences, it also introduces new risks since these models are vulnerable to adversarial attacks; a small perturbation of the input data that is visually undetectable to human observers can fool the cybersickness detection model and trigger unexpected mitigation, thus disrupting user immersive experiences (UIX) and even posing safety risks. In this paper, we present a new type of VR attack, i.e., a cybersickness attack, which successfully stops the triggering of cybersickness mitigation by fooling DL-based cybersickness detection models and dramatically hinders the UIX. Next, we propose a novel explainable artificial intelligence (XAI)-guided cybersickness attack detection framework to detect such attacks in VR to ensure UIX and a comfortable VR experience. We evaluate the proposed attack and the detection framework using two state-of-the-art open-source VR cybersickness datasets: Simulation 2021 and Gameplay dataset. Finally, to verify the effectiveness of our proposed method, we implement the attack and the XAI-based detection using a testbed with a custom-built VR roller coaster simulation with an HTC Vive Pro Eye headset and perform a user study. Our study shows that such an attack can dramatically hinder the UIX. However, our proposed XAI-guided cybersickness attack detection can successfully detect cybersickness attacks and trigger the proper mitigation, effectively reducing VR cybersickness.
虚拟现实(VR)与人工智能(AI),特别是基于深度学习(DL)的网络病检测模型之间的协同作用,通过自动检测网络病严重程度并适应各种缓解技术,带来了沉浸式体验前所未有的进步,从而提供平滑舒适的VR体验。尽管这种基于DL的网络病检测方法为提升用户体验提供了有前景的解决方案,但它也引入了新的风险,因为这些模型容易受到对抗性攻击;输入数据的小扰动虽然对人眼不可见,却可以误导网络病检测模型触发意外缓解措施,从而干扰用户沉浸式体验(UIX)甚至带来安全风险。本文中,我们提出了一种新型VR攻击——即网络病攻击——这种攻击成功地阻止了由基于DL的网络病检测模型驱动的缓解机制,并严重阻碍了UIX。接着,我们提出了一个新颖的可解释人工智能(XAI)引导的网络病攻击检测框架来识别此类在VR中的攻击,以确保UIX和舒适的VR体验。我们使用两个最先进的开源VR网络病数据集——Simulation 2021和Gameplay数据集——对所提出的攻击及其检测框架进行了评估。最后,为了验证我们方法的有效性,我们在一个定制的VR过山车模拟平台上实施了该攻击及基于XAI的检测,并使用HTC Vive Pro Eye头显进行了一项用户研究。我们的研究表明,这种攻击可以显著阻碍UIX。然而,我们的提出的由XAI引导的网络病攻击检测能够成功地识别出网络病攻击并触发适当的缓解措施,有效地减少了VR中的网络病。
https://arxiv.org/abs/2503.13419
To this day, accurately simulating local-scale precipitation and reliably reproducing its distribution remains a challenging task. The limited horizontal resolution of Global Climate Models is among the primary factors undermining their skill in this context. The physical mechanisms driving the onset and development of precipitation, especially in extreme events, operate at spatio-temporal scales smaller than those numerically resolved, thus struggling to be captured accurately. In order to circumvent this limitation, several downscaling approaches have been developed over the last decades to address the discrepancy between the spatial resolution of models output and the resolution required by local-scale applications. In this paper, we introduce RainScaleGAN, a conditional deep convolutional Generative Adversarial Network (GAN) for precipitation downscaling. GANs have been effectively used in image super-resolution, an approach highly relevant for downscaling tasks. RainScaleGAN's capabilities are tested in a perfect-model setup, where the spatial resolution of a precipitation dataset is artificially degraded from 0.25$^{\circ}\times$0.25$^{\circ}$ to 2$^{\circ}\times$2$^\circ$, and RainScaleGAN is used to restore it. The developed model outperforms one of the leading precipitation downscaling method found in the literature. RainScaleGAN not only generates a synthetic dataset featuring plausible high-resolution spatial patterns and intensities, but also produces a precipitation distribution with statistics closely mirroring those of the ground-truth dataset. Given that RainScaleGAN's approach is agnostic with respect to the underlying physics, the method has the potential to be applied to other physical variables such as surface winds or temperature.
至今,精确模拟局部尺度降水并可靠地再现其分布仍是一项具有挑战性的任务。全球气候模型的有限水平分辨率是削弱其在这一领域表现的主要因素之一。驱动降水开始和发展(尤其是在极端事件中)的物理机制,在空间和时间上的运行规模都小于数值上可解析的范围,因此难以准确捕捉。为了规避这个限制,过去几十年里开发了几种降尺度方法来解决模型输出的空间分辨率与局部应用所需分辨率之间的差异问题。本文介绍了RainScaleGAN,这是一种用于降水降尺度处理的条件深度卷积生成对抗网络(GAN)。GAN在图像超分辨率方面已被有效使用,这种方法对降尺度任务特别相关。通过在一个完美的模型设置中测试RainScaleGAN的能力,在该设置下将一个0.25°×0.25°空间分辨率的降水数据集人为退化为2°×2°的空间分辨率,并利用RainScaleGAN将其恢复到原始分辨率。开发出的模型在文献中领先的降水降尺度方法之一的表现上更胜一筹。RainScaleGAN不仅生成了一个具有真实高分辨率空间模式和强度特征的人工数据集,而且还产生了一个与地面真值数据集统计特性紧密匹配的降水分布。由于RainScaleGAN的方法不依赖于特定的物理原理,这种方法有可能被应用于其他物理变量如地表风速或温度上。
https://arxiv.org/abs/2503.13316
Recently, histopathology vision-language foundation models (VLMs) have gained popularity due to their enhanced performance and generalizability across different downstream tasks. However, most existing histopathology benchmarks are either unimodal or limited in terms of diversity of clinical tasks, organs, and acquisition instruments, as well as their partial availability to the public due to patient data privacy. As a consequence, there is a lack of comprehensive evaluation of existing histopathology VLMs on a unified benchmark setting that better reflects a wide range of clinical scenarios. To address this gap, we introduce HistoVL, a fully open-source comprehensive benchmark comprising images acquired using up to 11 various acquisition tools that are paired with specifically crafted captions by incorporating class names and diverse pathology descriptions. Our Histo-VL includes 26 organs, 31 cancer types, and a wide variety of tissue obtained from 14 heterogeneous patient cohorts, totaling more than 5 million patches obtained from over 41K WSIs viewed under various magnification levels. We systematically evaluate existing histopathology VLMs on Histo-VL to simulate diverse tasks performed by experts in real-world clinical scenarios. Our analysis reveals interesting findings, including large sensitivity of most existing histopathology VLMs to textual changes with a drop in balanced accuracy of up to 25% in tasks such as Metastasis detection, low robustness to adversarial attacks, as well as improper calibration of models evident through high ECE values and low model prediction confidence, all of which can affect their clinical implementation.
最近,由于在不同下游任务中的性能和泛化能力的提升,组织病理学视觉-语言基础模型(VLMs)变得越来越受欢迎。然而,大多数现有的组织病理学基准测试要么是单模态的,要么在临床任务、器官多样性以及获取仪器方面存在局限性,并且由于患者数据隐私的原因,这些基准测试对公众的部分开放程度有限。因此,在一个能更好地反映广泛临床场景的统一基准设置下,目前缺乏对现有组织病理学VLMs进行全面评估的方法。 为了解决这一缺口,我们引入了HistoVL,这是一个完全开源、全面的基准测试集合,其中包含使用多达11种不同的获取工具获得的图像,并配有专门制作的文字说明,结合了类别名称和多样化的病理描述。我们的Histo-VL包含了26个器官,31种癌症类型以及从14个异质性患者群体中获取的各种类型的组织样本,总计超过500万块切片,这些切片来自41,000多个WSI(全片扫描图像),在不同放大倍数下观察。 我们系统地评估了现有的组织病理学VLMs在Histo-VL上的表现,以模拟专家在真实临床场景中执行的多样化任务。我们的分析揭示了一些有趣的结果,包括大多数现有模型对文本变化的高度敏感性,在如转移灶检测等任务中的平衡准确率下降高达25%,对对抗攻击的低鲁棒性,以及通过高ECE值和低预测置信度体现出来的模型校准不当的问题,所有这些都可能影响它们在临床环境下的应用。
https://arxiv.org/abs/2503.12990
Large pre-trained vision-language models (VLMs), such as CLIP, demonstrate impressive generalization but remain highly vulnerable to adversarial examples (AEs). Previous work has explored robust text prompts through adversarial training, achieving some improvement in both robustness and generalization. However, they primarily rely on singlegradient direction perturbations (e.g., PGD) to generate AEs, which lack diversity, resulting in limited improvement in adversarial robustness. To address these limitations, we propose an evolution-based region adversarial prompt tuning method called ER-APT, which combines gradient methods with genetic evolution to generate more diverse and challenging AEs. In each training iteration, we first generate AEs using traditional gradient-based methods. Subsequently, a genetic evolution mechanism incorporating selection, mutation, and crossover is applied to optimize the AEs, ensuring a broader and more aggressive perturbation this http URL final evolved AEs are used for prompt tuning, achieving region-based adversarial optimization instead of conventional single-point adversarial prompt tuning. We also propose a dynamic loss weighting method to adjust prompt learning efficiency for accuracy and robustness. Experimental evaluations on various benchmark datasets demonstrate the superiority of our proposed method, outperforming stateof-the-art APT methods. The code is released at this https URL.
大型预训练视觉-语言模型(VLMs),如CLIP,在泛化方面表现出色,但仍然高度易受对抗性样本(AEs)的影响。之前的工作通过对抗训练探索了鲁棒的文本提示,从而在鲁棒性和泛化能力上取得了一定的进步。然而,它们主要依赖于单一梯度方向扰动(例如PGD)来生成AEs,这缺乏多样性,导致对抗性稳健性的提升有限。 为了克服这些限制,我们提出了一种基于进化的方法——区域对抗提示微调方法ER-APT,该方法结合了梯度法和遗传进化来生成更多样化且更具挑战性的AEs。在每次训练迭代中,首先使用传统的基于梯度的方法生成AEs。随后应用包含选择、变异和交叉的遗传进化机制以优化AEs,确保更广泛的扰动范围。最终演化的AEs用于提示微调,实现了区域级对抗性优化,而不是传统的一点式对抗性提示调整。 此外,我们还提出了一种动态损失加权方法来调节提示学习效率,使之在准确性和鲁棒性之间达到平衡。 实验评估显示,在多个基准数据集上的性能优于最先进的APT(Adversarial Prompt Tuning)方法。代码发布地址为[此处](https://this.is.the.release.url)。
https://arxiv.org/abs/2503.12874
Existing score-based adversarial attacks mainly focus on crafting $top$-1 adversarial examples against classifiers with single-label classification. Their attack success rate and query efficiency are often less than satisfactory, particularly under small perturbation requirements; moreover, the vulnerability of classifiers with multi-label learning is yet to be studied. In this paper, we propose a comprehensive surrogate free score-based attack, named \b geometric \b score-based \b black-box \b attack (GSBAK$^K$), to craft adversarial examples in an aggressive $top$-$K$ setting for both untargeted and targeted attacks, where the goal is to change the $top$-$K$ predictions of the target classifier. We introduce novel gradient-based methods to find a good initial boundary point to attack. Our iterative method employs novel gradient estimation techniques, particularly effective in $top$-$K$ setting, on the decision boundary to effectively exploit the geometry of the decision boundary. Additionally, GSBAK$^K$ can be used to attack against classifiers with $top$-$K$ multi-label learning. Extensive experimental results on ImageNet and PASCAL VOC datasets validate the effectiveness of GSBAK$^K$ in crafting $top$-$K$ adversarial examples.
现有的基于分数的对抗攻击主要集中在针对单标签分类器生成最有可能($top$-1)的对抗样本上。这些攻击的成功率和查询效率通常不尽如人意,尤其是在需要小幅度扰动的情况下;此外,对于多标签学习分类器的脆弱性尚未得到充分研究。 本文提出了一种全面且无需代理的基于分数的黑盒攻击方法,名为几何分数基黑盒攻击(GSBAK$^K$),用于在激进的$top$-$K$设置下生成对抗样本,适用于未定向和定向攻击。其目标是改变目标分类器的$top$-$K$预测结果。 我们引入了基于梯度的新方法来寻找一个好的初始边界点进行攻击,并且迭代过程中采用了特别有效的决策边界几何结构利用技术(特别是在$top$-$K$设置下)。此外,GSBAK$^K$还可以用于针对具有$top$-$K$多标签学习的分类器发起攻击。 在ImageNet和PASCAL VOC数据集上的广泛实验结果验证了GSBAK$^K$在生成$top$-$K$对抗样本方面的有效性。
https://arxiv.org/abs/2503.12827
Generating molecules with desired chemical properties presents a critical challenge in fields such as chemical synthesis and drug discovery. Recent advancements in artificial intelligence (AI) and deep learning have significantly contributed to data-driven molecular generation. However, challenges persist due to the inherent sensitivity of simplified molecular input line entry system (SMILES) representations and the difficulties in applying generative adversarial networks (GANs) to discrete data. This study introduces RL-MolGAN, a novel Transformer-based discrete GAN framework designed to address these challenges. Unlike traditional Transformer architectures, RL-MolGAN utilizes a first-decoder-then-encoder structure, facilitating the generation of drug-like molecules from both $de~novo$ and scaffold-based designs. In addition, RL-MolGAN integrates reinforcement learning (RL) and Monte Carlo tree search (MCTS) techniques to enhance the stability of GAN training and optimize the chemical properties of the generated molecules. To further improve the model's performance, RL-MolWGAN, an extension of RL-MolGAN, incorporates Wasserstein distance and mini-batch discrimination, which together enhance the stability of the GAN. Experimental results on two widely used molecular datasets, QM9 and ZINC, validate the effectiveness of our models in generating high-quality molecular structures with diverse and desirable chemical properties.
生成具有特定化学性质的分子在化学合成和药物发现等领域面临着关键挑战。近年来,人工智能(AI)和深度学习技术的发展极大地推动了基于数据驱动的分子生成工作。然而,由于简化分子输入线性表示系统(SMILES)表述的固有敏感性和将生成对抗网络(GANs)应用于离散数据时遇到的困难,仍存在一些问题需要解决。这项研究提出了一种新的基于Transformer的离散GAN框架——RL-MolGAN,旨在克服这些挑战。 与传统的Transformer架构不同,RL-MolGAN采用了一个先解码后编码的结构,这有助于从头开始设计和基于骨架的设计中生成具有药物性质的分子。此外,RL-MolGAN集成了强化学习(RL)和技术树搜索(MCTS),以增强GAN训练的稳定性,并优化生成分子的化学性质。为了进一步提高模型性能,研究团队提出了RL-MolWGAN,这是RL-MolGAN的一个扩展版本,它结合了Wasserstein距离和小批量判别技术,共同增强了GAN的稳定性。 实验结果表明,在两个广泛使用的分子数据集QM9和ZINC上,我们的模型能够有效生成具有多样性和理想化学性质的高质量分子结构。
https://arxiv.org/abs/2503.12796
Deep neural networks (DNNs) are susceptible to universal adversarial perturbations (UAPs). These perturbations are meticulously designed to fool the target model universally across all sample classes. Unlike instance-specific adversarial examples (AEs), generating UAPs is more complex because they must be generalized across a wide range of data samples and models. Our research reveals that existing universal attack methods, which optimize UAPs using DNNs with static model parameter snapshots, do not fully leverage the potential of DNNs to generate more effective UAPs. Rather than optimizing UAPs against static DNN models with a fixed training set, we suggest using dynamic model-data pairs to generate UAPs. In particular, we introduce a dynamic maximin optimization strategy, aiming to optimize the UAP across a variety of optimal model-data pairs. We term this approach DM-UAP. DM-UAP utilizes an iterative max-min-min optimization framework that refines the model-data pairs, coupled with a curriculum UAP learning algorithm to examine the combined space of model parameters and data thoroughly. Comprehensive experiments on the ImageNet dataset demonstrate that the proposed DM-UAP markedly enhances both cross-sample universality and cross-model transferability of UAPs. Using only 500 samples for UAP generation, DM-UAP outperforms the state-of-the-art approach with an average increase in fooling ratio of 12.108%.
深度神经网络(DNN)容易受到通用对抗性扰动(UAPs)的影响。这些扰动被精心设计成能够在所有样本类别中普遍欺骗目标模型。与针对特定实例的对抗性样本(AEs)不同,生成UAP更为复杂,因为它们必须在广泛的样本和模型范围内进行泛化。我们的研究发现,现有的通用攻击方法,在使用具有静态模型参数快照的DNN优化UAP时,并没有充分利用DNN生成更有效UAP的能力。我们建议通过利用动态模型-数据对来生成UAP,而不是针对训练集固定的DNN静态模型来优化UAP。 具体而言,我们引入了一种动态最大最小化优化策略,旨在优化多种最优的模型-数据对中的UAP。我们将这种方法命名为DM-UAP(Dynamic Maximin Universal Adversarial Perturbations)。DM-UAP采用了一个迭代的最大-最小-最小优化框架,在此框架下精炼模型-数据对,并结合了一种课程UAP学习算法来全面探索参数空间和数据组合。 在ImageNet数据集上的综合实验表明,所提出的DM-UAP显著提高了UAP的跨样本普遍性和跨模型可转移性。仅使用500个样本进行UAP生成时,DM-UAP相较于最先进的方法,在欺骗率上平均提升了12.108%。
https://arxiv.org/abs/2503.12793
Travel demand modeling has shifted from aggregated trip-based models to behavior-oriented activity-based models because daily trips are essentially driven by human activities. To analyze the sequential activity-travel decisions, deep inverse reinforcement learning (DIRL) has proven effective in learning the decision mechanisms by approximating a reward function to represent preferences and a policy function to replicate observed behavior using deep neural networks (DNNs). However, most existing research has focused on using DIRL to enhance only prediction accuracy, with limited exploration into interpreting the underlying decision mechanisms guiding sequential decision-making. To address this gap, we introduce an interpretable DIRL framework for analyzing activity-travel decision processes, bridging the gap between data-driven machine learning and theory-driven behavioral models. Our proposed framework adapts an adversarial IRL approach to infer the reward and policy functions of activity-travel behavior. The policy function is interpreted through a surrogate interpretable model based on choice probabilities from the policy function, while the reward function is interpreted by deriving both short-term rewards and long-term returns for various activity-travel patterns. Our analysis of real-world travel survey data reveals promising results in two key areas: (i) behavioral pattern insights from the policy function, highlighting critical factors in decision-making and variations among socio-demographic groups, and (ii) behavioral preference insights from the reward function, indicating the utility individuals gain from specific activity sequences.
旅行需求建模已经从基于行程的聚合模型转向了以行为为导向的活动基础模型,因为日常出行本质上是由人类活动驱动的。为了分析连续的活动-出行决策,深度逆向强化学习(DIRL)已经被证明可以通过近似奖励函数来表示偏好,并通过使用深层神经网络(DNNs)复制观察到的行为来有效学习决策机制中的政策函数。然而,现有的大多数研究主要集中在利用DIRL提高预测准确性上,对解释指导连续决策的底层决策机制的研究探索较少。为了解决这一缺口,我们提出了一种可解释的DIRL框架,用于分析活动-出行决策过程,旨在弥合数据驱动型机器学习和理论导向的行为模型之间的差距。我们的提议框架采用了一种对抗性的逆向强化学习方法来推断活动-出行行为的奖励函数和策略函数。通过基于政策函数的选择概率建立替代可解释模型的方式对政策函数进行解读,而通过为各种活动-旅行模式推导短期奖励和长期回报的方式来解释奖励函数。通过对实际旅行调查数据的分析,我们在两个关键领域取得了令人鼓舞的结果:(i)从策略函数中获得的行为模式洞察,强调了决策中的关键因素以及社会人口群体之间的差异;(ii)来自奖励函数的行为偏好洞察,表明个体从特定活动序列中获取的效用。
https://arxiv.org/abs/2503.12761
This paper presents UniBERT, a compact multilingual language model that leverages an innovative training framework integrating three components: masked language modeling, adversarial training, and knowledge distillation. Pre-trained on a meticulously curated Wikipedia corpus spanning 107 languages, UniBERT is designed to reduce the computational demands of large-scale models while maintaining competitive performance across various natural language processing tasks. Comprehensive evaluations on four tasks -- named entity recognition, natural language inference, question answering, and semantic textual similarity -- demonstrate that our multilingual training strategy enhanced by an adversarial objective significantly improves cross-lingual generalization. Specifically, UniBERT models show an average relative improvement of 7.72% over traditional baselines, which achieved an average relative improvement of only 1.17%, with statistical analysis confirming the significance of these gains (p-value = 0.0181). This work highlights the benefits of combining adversarial training and knowledge distillation to build scalable and robust language models, thereby advancing the field of multilingual and cross-lingual natural language processing.
本文介绍了UniBERT,这是一种紧凑型的多语言语言模型,它利用了一个创新性的训练框架,该框架整合了三个组成部分:遮蔽语言建模、对抗性训练和知识蒸馏。UniBERT在经过精心编纂的覆盖107种语言的维基百科语料库上进行了预训练,旨在降低大规模模型的计算需求,同时保持其在各种自然语言处理任务中的竞争力。 在四个任务——命名实体识别、自然语言推理、问答以及语义文本相似性上的全面评估表明,我们的多语言训练策略通过引入对抗性目标显著提升了跨语言泛化能力。具体来说,UniBERT模型相对于传统基线方法显示出平均7.72%的相对改进率,而后者仅实现了1.17%的平均相对改进率。统计分析证实了这些改善的重要性(p值=0.0181)。 这项工作强调了将对抗性训练和知识蒸馏相结合以构建可扩展且稳健的语言模型的优点,从而推进了多语言及跨语言自然语言处理领域的进展。
https://arxiv.org/abs/2503.12608
Computer Vision plays a critical role in ensuring the safe navigation of autonomous vehicles (AVs). An AV perception module is responsible for capturing and interpreting the surrounding environment to facilitate safe navigation. This module enables AVs to recognize traffic signs, traffic lights, and various road users. However, the perception module is vulnerable to adversarial attacks, which can compromise their accuracy and reliability. One such attack is the adversarial patch attack (APA), a physical attack in which an adversary strategically places a specially crafted sticker on an object to deceive object classifiers. In APA, an adversarial patch is positioned on a target object, leading the classifier to misidentify it. Such an APA can cause AVs to misclassify traffic signs, leading to catastrophic incidents. To enhance the security of an AV perception system against APAs, this study develops a Generative Adversarial Network (GAN)-based single-stage defense strategy for traffic sign classification. This approach is tailored to defend against APAs on different classes of traffic signs without prior knowledge of a patch's design. This study found this approach to be effective against patches of varying sizes. Our experimental analysis demonstrates that the defense strategy presented in this paper improves the classifier's accuracy under APA conditions by up to 80.8% and enhances overall classification accuracy for all the traffic signs considered in this study by 58%, compared to a classifier without any defense mechanism. Our defense strategy is model-agnostic, making it applicable to any traffic sign classifier, regardless of the underlying classification model.
计算机视觉在确保自动驾驶车辆(AV)的安全导航中起着关键作用。自动驾驶车的感知模块负责捕捉和解读周围环境,以实现安全导航。这一模块使自动驾驶汽车能够识别交通标志、红绿灯以及各种道路使用者。然而,感知模块容易受到对抗性攻击的影响,这些攻击会损害其准确性和可靠性。其中一种攻击是敌对贴纸攻击(APA),这是一种物理攻击,攻击者在物体上战略性地放置一张特别设计的贴纸以欺骗对象分类器。在APA中,一个敌对贴纸上被放置在一个目标物体上,导致分类器错误识别它。这种APA会导致自动驾驶汽车误识交通标志,从而引发灾难性事件。 为了增强自动驾驶车感知系统对抗APA的安全性,本研究开发了一种基于生成对抗网络(GAN)的单阶段防御策略,用于交通标志分类。此方法旨在针对不同类别的交通标志进行防御,而无需事先了解贴纸的设计信息。研究表明,这种方法对各种大小的敌对贴纸都有效。 我们的实验分析表明,本文提出的防御策略在APA条件下提高了分类器的准确率高达80.8%,并且相对于没有任何防御机制的分类器而言,它将所有考虑中的交通标志的整体分类准确性提升了58%。本研究的防御策略是模型无关的,因此适用于任何交通标志分类器,无论其基础分类模型为何。 通过这种方法,我们可以显著提高自动驾驶汽车在面对敌对贴纸攻击时的安全性和可靠性,从而更好地保护道路上行人和其他道路使用者的安全。
https://arxiv.org/abs/2503.12567
The increasing adoption of data-driven applications in education such as in learning analytics and AI in education has raised significant privacy and data protection concerns. While these challenges have been widely discussed in previous works, there are still limited practical solutions. Federated learning has recently been discoursed as a promising privacy-preserving technique, yet its application in education remains scarce. This paper presents an experimental evaluation of federated learning for educational data prediction, comparing its performance to traditional non-federated approaches. Our findings indicate that federated learning achieves comparable predictive accuracy. Furthermore, under adversarial attacks, federated learning demonstrates greater resilience compared to non-federated settings. We summarise that our results reinforce the value of federated learning as a potential approach for balancing predictive performance and privacy in educational contexts.
在教育领域,基于数据的应用(如学习分析和教育中的AI)的采用日益增加,引发了重要的隐私和数据保护问题。虽然这些挑战已在以往的研究中广泛讨论,但实用解决方案仍较为有限。联邦学习作为一种有前景的隐私保护技术近期受到关注,但在教育领域的应用却很少见。本文通过实验评估了联邦学习在教育数据预测中的表现,并将其与传统的非联邦方法进行了比较。我们的研究发现表明,联邦学习实现了相当的预测准确性。此外,在对抗性攻击下,联邦学习相比非联邦设置表现出更强的抵御能力。我们总结认为,这些结果强化了联邦学习作为一种可能的方法,在教育环境中平衡预测性能和隐私的价值。
https://arxiv.org/abs/2503.13550
Carotid atherosclerosis represents a significant health risk, with its early diagnosis primarily dependent on ultrasound-based assessments of carotid intima-media thickening. However, during carotid ultrasound screening, significant view variations cause style shifts, impairing content cues related to thickening, such as lumen anatomy, which introduces spurious correlations that hinder assessment. Therefore, we propose a novel causal-inspired method for assessing carotid intima-media thickening in frame-wise ultrasound videos, which focuses on two aspects: eliminating spurious correlations caused by style and enhancing causal content correlations. Specifically, we introduce a novel Spurious Correlation Elimination (SCE) module to remove non-causal style effects by enforcing prediction invariance with style perturbations. Simultaneously, we propose a Causal Equivalence Consolidation (CEC) module to strengthen causal content correlation through adversarial optimization during content randomization. Simultaneously, we design a Causal Transition Augmentation (CTA) module to ensure smooth causal flow by integrating an auxiliary pathway with text prompts and connecting it through contrastive learning. The experimental results on our in-house carotid ultrasound video dataset achieved an accuracy of 86.93\%, demonstrating the superior performance of the proposed method. Code is available at \href{this https URL}{this https URL}.
颈动脉粥样硬化是一个重要的健康风险,其早期诊断主要依赖于基于超声波的颈动脉内膜中层增厚评估。然而,在进行颈动脉超声筛查时,显著的视图变化会导致风格转换,这会损害与增厚有关的内容线索(如管腔解剖),从而引入虚假相关性,妨碍了评估过程。因此,我们提出了一种新颖的方法来评估帧级超声视频中的颈动脉内膜中层增厚情况,该方法主要关注两个方面:消除由风格变化引起的虚假关联,并增强因果内容的相关性。 具体而言,我们引入了一个新的虚假相关性消除(Spurious Correlation Elimination, SCE)模块,通过强制预测在风格扰动下保持不变来去除非因果的风格效应。同时,我们提出了一种因果等价巩固(Causal Equivalence Consolidation, CEC)模块,在内容随机化期间通过对抗优化增强因果内容的相关性。此外,为了确保平滑的因果流传递,我们设计了一个因果转换增强(Causal Transition Augmentation, CTA)模块,该模块整合了带有文本提示的辅助路径,并通过对比学习将其连接起来。 在我们内部构建的颈动脉超声视频数据集上进行的实验结果显示准确率为86.93%,展示了所提出方法的优越性能。代码可在[此链接](this https URL)获取。
https://arxiv.org/abs/2503.12418
Reasoning and strategic behavior in \emph{social interactions} is a hallmark of intelligence. This form of reasoning is significantly more sophisticated than isolated planning or reasoning tasks in static settings (e.g., math problem solving). In this paper, we present \textit{Strategic Planning, Interaction, and Negotiation} (\textbf{SPIN-Bench}), a new multi-domain evaluation designed to measure the intelligence of \emph{strategic planning} and \emph{social reasoning}. While many existing benchmarks focus on narrow planning or single-agent reasoning, SPIN-Bench combines classical PDDL tasks, competitive board games, cooperative card games, and multi-agent negotiation scenarios in one unified framework. The framework includes both a benchmark as well as an arena to simulate and evaluate the variety of social settings to test reasoning and strategic behavior of AI agents. We formulate the benchmark SPIN-Bench by systematically varying action spaces, state complexity, and the number of interacting agents to simulate a variety of social settings where success depends on not only methodical and step-wise decision making, but also \emph{conceptual inference} of other (adversarial or cooperative) participants. Our experiments reveal that while contemporary LLMs handle \emph{basic fact retrieval} and \emph{short-range planning} reasonably well, they encounter significant performance bottlenecks in tasks requiring \emph{deep multi-hop reasoning} over large state spaces and \emph{socially adept} coordination under uncertainty. We envision SPIN-Bench as a catalyst for future research on robust multi-agent planning, social reasoning, and human--AI teaming.
在社会互动中的推理和策略行为是智能的一个标志。这种形式的推理比孤立地处理静态环境中的规划或推理任务(如数学问题解决)复杂得多。本文介绍了“战略规划、互动与谈判”(SPIN-Bench),这是一种新的多领域评估框架,旨在衡量战略规划和社会推理方面的智能水平。尽管许多现有的基准测试主要关注狭隘的规划或单一代理推理,但SPIN-Bench将经典的PDDL任务、竞争性棋盘游戏、合作卡牌游戏和多代理谈判场景统一在一个框架中。该框架包括一个基准以及一个模拟平台,用于在多种社会环境中测试AI代理的推理能力和策略行为。 我们通过系统地改变行动空间、状态复杂性和交互代理的数量来制定SPIN-Bench基准,以模拟各种社交环境,在这些环境中成功不仅依赖于系统的和逐步的决策过程,还依赖于对其他(对手或合作)参与者概念性推断的能力。我们的实验表明,虽然当代大型语言模型在处理基本事实检索和短程规划方面表现出色,但在需要深入多步推理以及在不确定环境下进行社会协调的任务中遇到了显著性能瓶颈。 我们期望SPIN-Bench能够成为未来关于稳健的多代理规划、社交推理及人机协作研究的一个催化剂。
https://arxiv.org/abs/2503.12349
Gradient optimization-based adversarial attack methods automate the learning of adversarial triggers to generate jailbreak prompts or leak system prompts. In this work, we take a closer look at the optimization objective of adversarial trigger learning and propose ATLA: Adversarial Trigger Learning with Augmented objectives. ATLA improves the negative log-likelihood loss used by previous studies into a weighted loss formulation that encourages the learned adversarial triggers to optimize more towards response format tokens. This enables ATLA to learn an adversarial trigger from just one query-response pair and the learned trigger generalizes well to other similar queries. We further design a variation to augment trigger optimization with an auxiliary loss that suppresses evasive responses. We showcase how to use ATLA to learn adversarial suffixes jailbreaking LLMs and to extract hidden system prompts. Empirically we demonstrate that ATLA consistently outperforms current state-of-the-art techniques, achieving nearly 100% success in attacking while requiring 80% fewer queries. ATLA learned jailbreak suffixes demonstrate high generalization to unseen queries and transfer well to new LLMs.
基于梯度优化的对抗攻击方法自动化地学习了生成越狱提示或泄露系统提示所需的对抗触发器。在这项工作中,我们深入研究了对抗性触发器学习的优化目标,并提出了ATLA:具有增强目标的对抗性触发器学习(Adversarial Trigger Learning with Augmented objectives)。ATLA将之前研究中使用的负对数似然损失改进为加权损失形式,以鼓励所学的对抗性触发器更多地向着响应格式令牌进行优化。这使得ATLA能够仅通过一个查询-响应对就学会一种对抗性触发,并且学习到的触发器在面对其他类似查询时具有良好的泛化能力。我们进一步设计了一种变体,在触发器优化中引入了一个辅助损失,以抑制逃避反应。我们展示了如何使用ATLA来学习用于越狱大型语言模型(LLM)的对抗性后缀和提取隐藏系统提示的方法。通过实验证明,ATLA在攻击成功率上始终优于当前最先进的技术,且所需的查询数量减少了80%。此外,ATLA所学到的越狱后缀对未见过的查询表现出高度泛化能力,并能很好地迁移到新的LLM中。
https://arxiv.org/abs/2503.12339