We present a novel approach to integrating scientific knowledge into generative models, enhancing their realism and consistency in image synthesis. First, we introduce Science-T2I, an expert-annotated adversarial dataset comprising adversarial 20k image pairs with 9k prompts, covering wide distinct scientific knowledge categories. Leveraging Science-T2I, we present SciScore, an end-to-end reward model that refines the assessment of generated images based on scientific knowledge, which is achieved by augmenting both the scientific comprehension and visual capabilities of pre-trained CLIP model. Additionally, based on SciScore, we propose a two-stage training framework, comprising a supervised fine-tuning phase and a masked online fine-tuning phase, to incorporate scientific knowledge into existing generative models. Through comprehensive experiments, we demonstrate the effectiveness of our framework in establishing new standards for evaluating the scientific realism of generated content. Specifically, SciScore attains performance comparable to human-level, demonstrating a 5% improvement similar to evaluations conducted by experienced human evaluators. Furthermore, by applying our proposed fine-tuning method to FLUX, we achieve a performance enhancement exceeding 50% on SciScore.
我们提出了一种将科学知识融入生成模型的新方法,从而增强图像合成的现实感和一致性。首先,我们介绍了Science-T2I,这是一个由专家注释的对抗性数据集,包含2万个对抗性的图像对以及9千个提示词,涵盖了广泛的不同的科学知识类别。利用Science-T2I,我们提出了SciScore,一个端到端的奖励模型,该模型基于科学知识来优化生成图像的评估,并通过增强预训练CLIP模型的科学理解和视觉能力来实现这一点。此外,基于SciScore,我们提出了一种两阶段训练框架,包括监督微调阶段和掩码在线微调阶段,以将科学知识融入现有的生成模型中。通过全面实验,我们展示了我们的框架在评估生成内容的科学现实性方面建立了新的标准。具体而言,SciScore达到了接近人类水平的表现,并且与经验丰富的评审人员进行的人类评价相比,表现提高了5%左右。此外,通过应用我们提出的微调方法到FLUX模型上,我们在SciScore上的性能提升了超过50%。
https://arxiv.org/abs/2504.13129
Generative Adversarial Network (GAN) inversion have demonstrated excellent performance in image inpainting that aims to restore lost or damaged image texture using its unmasked content. Previous GAN inversion-based methods usually utilize well-trained GAN models as effective priors to generate the realistic regions for missing holes. Despite excellence, they ignore a hard constraint that the unmasked regions in the input and the output should be the same, resulting in a gap between GAN inversion and image inpainting and thus degrading the performance. Besides, existing GAN inversion approaches often consider a single modality of the input image, neglecting other auxiliary cues in images for improvements. Addressing these problems, we propose a novel GAN inversion approach, dubbed MMInvertFill, for image inpainting. MMInvertFill contains primarily a multimodal guided encoder with a pre-modulation and a GAN generator with F&W+ latent space. Specifically, the multimodal encoder aims to enhance the multi-scale structures with additional semantic segmentation edge texture modalities through a gated mask-aware attention module. Afterwards, a pre-modulation is presented to encode these structures into style vectors. To mitigate issues of conspicuous color discrepancy and semantic inconsistency, we introduce the F&W+ latent space to bridge the gap between GAN inversion and image inpainting. Furthermore, in order to reconstruct faithful and photorealistic images, we devise a simple yet effective Soft-update Mean Latent module to capture more diversified in-domain patterns for generating high-fidelity textures for massive corruptions. In our extensive experiments on six challenging datasets, we show that our MMInvertFill qualitatively and quantitatively outperforms other state-of-the-arts and it supports the completion of out-of-domain images effectively.
生成对抗网络(GAN)逆向技术在图像修复领域中展示了卓越的性能,其目标是利用未遮挡的内容恢复丢失或损坏的纹理。之前的基于 GAN 逆向的方法通常会使用经过良好训练的 GAN 模型作为有效的先验条件来生成缺失区域的真实感部分。尽管这些方法表现优秀,但它们忽略了输入图像和输出图像中未遮挡区域应当保持一致这一严格的约束条件,这导致了 GAN 逆向与图像修复之间的差距,并因此降低了性能。此外,现有的 GAN 逆向方法通常仅考虑输入图像的单一模式,忽视了其他在改进方面有帮助的辅助线索。 为了应对这些挑战,我们提出了一种新颖的 GAN 逆向方法,称为 MMInvertFill,用于图像修复。MMInvertFill 主要包含一个多模态引导编码器和一个 F&W+ 潜空间中的 GAN 发生器。具体来说,多模态编码器旨在通过门控掩码感知注意模块增强多层次结构,并且引入了额外的语义分割边缘纹理模式。随后,我们提出预调制技术将这些结构编码为样式向量。为了缓解明显的颜色差异和语义不一致的问题,我们引进 F&W+ 潜空间来弥合 GAN 逆向与图像修复之间的差距。 进一步地,为了重构忠实且逼真的图像,我们设计了一个简单而有效的软更新均值潜空间模块,以捕捉更多的域内模式,并为大量损坏生成高质量的纹理。在六个具有挑战性的数据集上的广泛实验中,我们的 MMInvertFill 从定性和定量上都超越了当前最佳方法,并支持跨域图像的有效完成任务。
https://arxiv.org/abs/2504.12844
The rapid advancement of diffusion models and personalization techniques has made it possible to recreate individual portraits from just a few publicly available images. While such capabilities empower various creative applications, they also introduce serious privacy concerns, as adversaries can exploit them to generate highly realistic impersonations. To counter these threats, anti-personalization methods have been proposed, which add adversarial perturbations to published images to disrupt the training of personalization models. However, existing approaches largely overlook the intrinsic multi-image nature of personalization and instead adopt a naive strategy of applying perturbations independently, as commonly done in single-image settings. This neglects the opportunity to leverage inter-image relationships for stronger privacy protection. Therefore, we advocate for a group-level perspective on privacy protection against personalization. Specifically, we introduce Cross-image Anti-Personalization (CAP), a novel framework that enhances resistance to personalization by enforcing style consistency across perturbed images. Furthermore, we develop a dynamic ratio adjustment strategy that adaptively balances the impact of the consistency loss throughout the attack iterations. Extensive experiments on the classical CelebHQ and VGGFace2 benchmarks show that CAP substantially improves existing methods.
扩散模型和个性化技术的迅速发展使得仅通过几张公开图像就能复原个人肖像成为可能。虽然这种能力为各种创意应用提供了支持,但也带来了严重的隐私问题,因为对手可以利用这些技术生成高度逼真的伪装画像。为了应对这些威胁,反个性化方法被提出,它们通过对发布的图片添加对抗性扰动来扰乱个性化模型的训练过程。然而,现有的方法大多忽视了个人化内在的多图像特性,并且采用了一种类似于单一图像设置中的简单策略,在这种情况下独立地应用扰动,从而忽略了利用图像间关系以增强隐私保护的机会。 因此,我们提倡从群体层面出发,采取一种对抗个性化的隐私保护视角。具体来说,我们引入了一个名为跨图反个性化(Cross-image Anti-Personalization, CAP)的新框架,通过强制执行被扰动图片间的风格一致性来提高对个性化的抵抗能力。此外,我们还开发了一种动态比率调整策略,在整个攻击迭代过程中自适应地平衡一致性损失的影响。 在经典的人脸数据集CelebHQ和VGGFace2上的广泛实验表明,CAP显著提升了现有方法的性能。
https://arxiv.org/abs/2504.12747
Deep learning (DL)-based image classification models are essential for autonomous vehicle (AV) perception modules since incorrect categorization might have severe repercussions. Adversarial attacks are widely studied cyberattacks that can lead DL models to predict inaccurate output, such as incorrectly classified traffic signs by the perception module of an autonomous vehicle. In this study, we create and compare hybrid classical-quantum deep learning (HCQ-DL) models with classical deep learning (C-DL) models to demonstrate robustness against adversarial attacks for perception modules. Before feeding them into the quantum system, we used transfer learning models, alexnet and vgg-16, as feature extractors. We tested over 1000 quantum circuits in our HCQ-DL models for projected gradient descent (PGD), fast gradient sign attack (FGSA), and gradient attack (GA), which are three well-known untargeted adversarial approaches. We evaluated the performance of all models during adversarial attacks and no-attack scenarios. Our HCQ-DL models maintain accuracy above 95\% during a no-attack scenario and above 91\% for GA and FGSA attacks, which is higher than C-DL models. During the PGD attack, our alexnet-based HCQ-DL model maintained an accuracy of 85\% compared to C-DL models that achieved accuracies below 21\%. Our results highlight that the HCQ-DL models provide improved accuracy for traffic sign classification under adversarial settings compared to their classical counterparts.
基于深度学习(DL)的图像分类模型对于自动驾驶汽车(AV)感知模块至关重要,因为错误的分类可能会导致严重后果。对抗性攻击是广泛研究的网络攻击之一,可以导致DL模型预测出不准确的结果,例如自动驾驶车辆感知模块中交通标志被误分类的情况。在这项研究中,我们创建并比较了混合经典-量子深度学习(HCQ-DL)模型与传统深度学习(C-DL)模型,以展示其在对抗性攻击下为感知模块提供的鲁棒性。为了将它们输入到量子系统之前,我们使用迁移学习模型alexnet和vgg-16作为特征提取器。我们在我们的HCQ-DL模型中测试了超过1000个量子电路,针对投影梯度下降(PGD)、快速梯度符号攻击(FGSA)和梯度攻击(GA),这三种著名的非目标对抗性方法进行了测试。我们评估了所有模型在遭遇对抗性和无攻击场景下的性能表现。我们的HCQ-DL模型在无攻击场景下保持超过95%的准确率,在面对GA和FGSA攻击时准确率维持在91%以上,这一数值高于传统DL模型的表现。在PGD攻击期间,我们基于alexnet的HCQ-DL模型能够保持85%的准确性,而C-DL模型的准确率则低于21%。我们的研究结果表明,在对抗性设置下,HCQ-DL模型为交通标志分类提供的精度优于其传统DL模型对应者。
https://arxiv.org/abs/2504.12644
Adversarial attacks on image models threaten system robustness by introducing imperceptible perturbations that cause incorrect predictions. We investigate human-aligned learned lossy compression as a defense mechanism, comparing two learned models (HiFiC and ELIC) against traditional JPEG across various quality levels. Our experiments on ImageNet subsets demonstrate that learned compression methods outperform JPEG, particularly for Vision Transformer architectures, by preserving semantically meaningful content while removing adversarial noise. Even in white-box settings where attackers can access the defense, these methods maintain substantial effectiveness. We also show that sequential compression--applying rounds of compression/decompression--significantly enhances defense efficacy while maintaining classification performance. Our findings reveal that human-aligned compression provides an effective, computationally efficient defense that protects the image features most relevant to human and machine understanding. It offers a practical approach to improving model robustness against adversarial threats.
对抗性攻击通过引入人类难以察觉的扰动来威胁图像模型的鲁棒性,导致预测错误。我们调查了与人类感知一致的学习型有损压缩作为防御机制的效果,并将两种学习模型(HiFiC和ELIC)与传统的JPEG压缩进行了比较,在不同的质量水平下进行对比实验。在ImageNet子集上的试验表明,学习型压缩方法优于JPEG,尤其是在视觉变压器架构中,它们能够在保留语义上有意义的内容的同时去除对抗性噪声。即使是在白盒设置中,攻击者可以访问防御措施的情况下,这些方法仍然保持了显著的效果。我们还展示了顺序压缩——多次应用压缩/解压循环——能够大幅提升防御效果,同时保持分类性能的稳定。我们的研究结果表明,与人类感知一致的压缩提供了一种有效且计算效率高的防御方式,保护了对人类和机器理解至关重要的图像特征,并为提高模型对抗威胁的鲁棒性提供了实用的方法。
https://arxiv.org/abs/2504.12255
The rise of customized diffusion models has spurred a boom in personalized visual content creation, but also poses risks of malicious misuse, severely threatening personal privacy and copyright protection. Some studies show that the aesthetic properties of images are highly positively correlated with human perception of image quality. Inspired by this, we approach the problem from a novel and intriguing aesthetic perspective to degrade the generation quality of maliciously customized models, thereby achieving better protection of facial identity. Specifically, we propose a Hierarchical Anti-Aesthetic (HAA) framework to fully explore aesthetic cues, which consists of two key branches: 1) Global Anti-Aesthetics: By establishing a global anti-aesthetic reward mechanism and a global anti-aesthetic loss, it can degrade the overall aesthetics of the generated content; 2) Local Anti-Aesthetics: A local anti-aesthetic reward mechanism and a local anti-aesthetic loss are designed to guide adversarial perturbations to disrupt local facial identity. By seamlessly integrating both branches, our HAA effectively achieves the goal of anti-aesthetics from a global to a local level during customized generation. Extensive experiments show that HAA outperforms existing SOTA methods largely in identity removal, providing a powerful tool for protecting facial privacy and copyright.
定制化扩散模型的兴起推动了个性化视觉内容创作的繁荣,但也带来了恶意滥用的风险,严重威胁到个人隐私和版权保护。一些研究表明,图像的美学属性与其质量感知高度正相关。受此启发,我们从新颖且引人入胜的美学角度出发,通过降低恶意定制模型的生成质量来更好地保护面部身份。具体来说,我们提出了一种分层反美学(Hierarchical Anti-Aesthetic, HAA)框架,充分探索美学线索,该框架包括两个关键分支: 1. **全局反美学**:通过建立全局反美学奖励机制和全局反美学损失函数,可以降低生成内容的整体美观度。 2. **局部反美学**:设计了局部反美学奖励机制和局部反美学损失函数来引导对抗性扰动破坏局部面部识别。 通过无缝整合这两个分支,我们的HAA框架从整体到局部有效地实现了定制化生成过程中的反美学目标。广泛的实验表明,HAA在身份去除方面大幅超越现有的最先进方法(SOTA),为保护面部隐私和版权提供了强大的工具。
https://arxiv.org/abs/2504.12129
Unmanned Aerial Vehicles (UAVs) are indispensable for infrastructure inspection, surveillance, and related tasks, yet they also introduce critical security challenges. This survey provides a wide-ranging examination of the anti-UAV domain, centering on three core objectives-classification, detection, and tracking-while detailing emerging methodologies such as diffusion-based data synthesis, multi-modal fusion, vision-language modeling, self-supervised learning, and reinforcement learning. We systematically evaluate state-of-the-art solutions across both single-modality and multi-sensor pipelines (spanning RGB, infrared, audio, radar, and RF) and discuss large-scale as well as adversarially oriented benchmarks. Our analysis reveals persistent gaps in real-time performance, stealth detection, and swarm-based scenarios, underscoring pressing needs for robust, adaptive anti-UAV systems. By highlighting open research directions, we aim to foster innovation and guide the development of next-generation defense strategies in an era marked by the extensive use of UAVs.
无人飞行器(UAV)在基础设施检查、监控及相关任务中不可或缺,但同时也带来了关键的安全挑战。本综述对反无人机领域进行了全面的考察,重点关注三个核心目标:分类、检测和跟踪,并详细介绍了扩散数据合成、多模态融合、视觉语言建模、自监督学习以及强化学习等新兴方法。我们系统性地评估了单模态及多传感器管道(涵盖RGB图像、红外线、音频、雷达和射频)中当前最先进的解决方案,并讨论大规模及针对对抗场景的基准测试。我们的分析揭示了实时性能、隐蔽检测及集群式情景中的持久缺口,强调了开发稳健且适应性强的反无人机系统的需求。通过突出开放的研究方向,我们旨在激发创新并指导下一代防御策略的发展,在这一广泛应用UAV的时代中发挥关键作用。
https://arxiv.org/abs/2504.11967
An ideal detection system for machine generated content is supposed to work well on any generator as many more advanced LLMs come into existence day by day. Existing systems often struggle with accurately identifying AI-generated content over shorter texts. Further, not all texts might be entirely authored by a human or LLM, hence we focused more over partial cases i.e human-LLM co-authored texts. Our paper introduces a set of models built for the task of token classification which are trained on an extensive collection of human-machine co-authored texts, which performed well over texts of unseen domains, unseen generators, texts by non-native speakers and those with adversarial inputs. We also introduce a new dataset of over 2.4M such texts mostly co-authored by several popular proprietary LLMs over 23 languages. We also present findings of our models' performance over each texts of each domain and generator. Additional findings include comparison of performance against each adversarial method, length of input texts and characteristics of generated texts compared to the original human authored texts.
一个理想的检测系统应该能够有效识别由任何生成器(包括日益增多的先进LLM)产生的内容。现有的系统在较短文本上准确地识别AI生成的内容时往往面临挑战。此外,并非所有文本完全是由人类或LLM创作的,因此我们更关注部分情况,即人类与LLM共同创作的文本。我们的论文介绍了一组用于标记分类任务的模型,这些模型是在大量的人机共著文本上训练而成,能够在来自未见过领域的、由未知生成器产生的、非母语作者撰写以及受到对抗性输入影响的文本中表现出色。 我们还引入了一个包含超过240万条记录的新数据集,其中大部分是通过多种流行专有LLM在23种语言上共同创作而成。此外,我们的论文展示了模型在每种领域和生成器上的性能表现,并且还包括了与各种对抗方法的性能比较、输入文本长度以及生成文本特性(相较于原始的人类撰写内容)的分析结果。
https://arxiv.org/abs/2504.11952
Unrestricted adversarial examples (UAEs), allow the attacker to create non-constrained adversarial examples without given clean samples, posing a severe threat to the safety of deep learning models. Recent works utilize diffusion models to generate UAEs. However, these UAEs often lack naturalness and imperceptibility due to simply optimizing in intermediate latent noises. In light of this, we propose SemDiff, a novel unrestricted adversarial attack that explores the semantic latent space of diffusion models for meaningful attributes, and devises a multi-attributes optimization approach to ensure attack success while maintaining the naturalness and imperceptibility of generated UAEs. We perform extensive experiments on four tasks on three high-resolution datasets, including CelebA-HQ, AFHQ and ImageNet. The results demonstrate that SemDiff outperforms state-of-the-art methods in terms of attack success rate and imperceptibility. The generated UAEs are natural and exhibit semantically meaningful changes, in accord with the attributes' weights. In addition, SemDiff is found capable of evading different defenses, which further validates its effectiveness and threatening.
不受限制的对抗样本(UAEs),允许攻击者在没有提供干净样本的情况下创建非约束性的对抗样本,这对深度学习模型的安全构成了严重的威胁。近期的研究工作利用扩散模型来生成UAEs,然而这些UAEs通常缺乏自然性和不可察觉性,仅仅是在中间潜在噪声中进行优化所致。鉴于此,我们提出了SemDiff这一新颖的不受限制的对抗攻击方法,该方法探索了扩散模型中的语义潜在空间以获取有意义的属性,并设计了一种多属性优化方法来确保攻击成功率的同时保持生成UAEs的自然性和不可察觉性。我们在三个高分辨率数据集(包括CelebA-HQ、AFHQ和ImageNet)上的四个任务上进行了广泛的实验,结果表明SemDiff在攻击成功率和不可察觉性方面优于当前最先进的方法。所生成的UAEs具有自然性,并表现出与属性权重相一致的语义有意义的变化。此外,发现SemDiff能够规避不同的防御措施,这进一步验证了其有效性和威胁性。
https://arxiv.org/abs/2504.11923
Large text-to-image diffusion models have demonstrated remarkable image synthesis capabilities, but their indiscriminate training on Internet-scale data has led to learned concepts that enable harmful, copyrighted, or otherwise undesirable content generation. We address the task of concept erasure in diffusion models, i.e., removing a specified concept from a pre-trained model such that prompting the concept (or related synonyms) no longer yields its depiction, while preserving the model's ability to generate other content. We propose a novel method, Attentional Concept Erasure (ACE), that integrates a closed-form attention manipulation with lightweight fine-tuning. Theoretically, we formulate concept erasure as aligning the model's conditional distribution on the target concept with a neutral distribution. Our approach identifies and nullifies concept-specific latent directions in the cross-attention modules via a gated low-rank adaptation, followed by adversarially augmented fine-tuning to ensure thorough erasure of the concept and its synonyms. Empirically, we demonstrate on multiple benchmarks, including object classes, celebrity faces, explicit content, and artistic styles, that ACE achieves state-of-the-art concept removal efficacy and robustness. Compared to prior methods, ACE better balances generality (erasing concept and related terms) and specificity (preserving unrelated content), scales to dozens of concepts, and is efficient, requiring only a few seconds of adaptation per concept. We will release our code to facilitate safer deployment of diffusion models.
大型文本到图像的扩散模型在合成高质量图像方面展示了卓越的能力,但它们对互联网规模数据的无差别训练导致了有害、受版权保护或不希望出现的内容概念的学习。我们提出解决扩散模型中的概念擦除问题,即从预训练模型中移除指定的概念,使得提示该概念(或者相关同义词)将不再生成其描绘内容,同时保持模型生成其他内容的能力不受影响。 为此,我们提出了一个新的方法——注意力概念擦除(ACE),它结合了闭式形式的注意力调整与轻量级微调。从理论上讲,我们将概念擦除定义为让模型在目标概念上的条件分布与其中立分布对齐的问题。我们的方法通过门控低秩适应(gated low-rank adaptation)来识别和消除交叉注意力模块中的特定概念潜在方向,并通过对抗增强的微调确保彻底移除该概念及其同义词。 实证研究显示,ACE在多个基准测试上——包括物体类别、名人面孔、显性内容以及艺术风格等场景下,达到了最先进的概念移除效率和鲁棒性。与先前的方法相比,ACE更好地平衡了广泛适用性和特定性的要求(既能擦除指定的概念及其相关术语,又能保留无关的内容),能够处理数十个概念,并且在每个概念的微调过程中仅需几秒钟的时间,非常高效。 我们计划发布代码以促进扩散模型的安全部署。
https://arxiv.org/abs/2504.11850
In the past years, we have witnessed the remarkable success of Text-to-Image (T2I) models and their widespread use on the web. Extensive research in making T2I models produce hyper-realistic images has led to new concerns, such as generating Not-Safe-For-Work (NSFW) web content and polluting the web society. To help prevent misuse of T2I models and create a safer web environment for users features like NSFW filters and post-hoc security checks are used in these models. However, recent work unveiled how these methods can easily fail to prevent misuse. In particular, adversarial attacks on text and image modalities can easily outplay defensive measures. %Exploiting such leads to the growing concern of preventing adversarial attacks on text and image modalities. Moreover, there is currently no robust multimodal NSFW dataset that includes both prompt and image pairs and adversarial examples. This work proposes a million-scale prompt and image dataset generated using open-source diffusion models. Second, we develop a multimodal defense to distinguish safe and NSFW text and images, which is robust against adversarial attacks and directly alleviates current challenges. Our extensive experiments show that our model performs well against existing SOTA NSFW detection methods in terms of accuracy and recall, drastically reducing the Attack Success Rate (ASR) in multimodal adversarial attack scenarios. Code: this https URL.
在过去几年里,我们见证了文本到图像(T2I)模型的显著成功及其在网上的广泛应用。为了使T2I模型生成超逼真的图像而进行的研究导致了新的担忧,例如生成不适合工作场合(NSFW)的内容以及污染网络社会。为帮助防止滥用T2I模型并创建一个更安全的网络环境,这些模型中采用了如NSFW过滤器和事后安全检查等特性。然而,最近的工作揭示了这些方法如何容易地失效于防止滥用。特别是,针对文本和图像模态的对抗性攻击可以轻易绕过防御措施。 目前还没有强大的跨模态NSFW数据集,该数据集包括提示语与图片对以及对抗样本。这项工作提出了一种利用开源扩散模型生成的一百万规模的提示语和图像数据集。其次,我们开发了一种多模式防御机制,用于区分安全与不合适的文本和图像,并且这种机制能够抵抗对抗性攻击并直接缓解当前挑战。 我们的广泛实验表明,我们的模型在准确性和召回率方面优于现有的最先进的NSFW检测方法,在多模态对抗攻击场景中大幅降低了攻击成功率(ASR)。代码链接:[请在此处插入实际的URL]。
https://arxiv.org/abs/2504.11707
To the best of our knowledge, all existing methods that can generate synthetic brain magnetic resonance imaging (MRI) scans for a specific individual require detailed structural or volumetric information about the individual's brain. However, such brain information is often scarce, expensive, and difficult to obtain. In this paper, we propose the first approach capable of generating synthetic brain MRI segmentations -- specifically, 3D white matter (WM), gray matter (GM), and cerebrospinal fluid (CSF) segmentations -- for individuals using their easily obtainable and often readily available demographic, interview, and cognitive test information. Our approach features a novel deep generative model, CSegSynth, which outperforms existing prominent generative models, including conditional variational autoencoder (C-VAE), conditional generative adversarial network (C-GAN), and conditional latent diffusion model (C-LDM). We demonstrate the high quality of our synthetic segmentations through extensive evaluations. Also, in assessing the effectiveness of the individual-specific generation, we achieve superior volume prediction, with Pearson correlation coefficients reaching 0.80, 0.82, and 0.70 between the ground-truth WM, GM, and CSF volumes of test individuals and those volumes predicted based on generated individual-specific segmentations, respectively.
根据我们现有的知识,所有已知能够为特定个体生成合成脑磁共振成像(MRI)扫描的方法都需要关于该个体大脑的详细结构或体积信息。然而,这种脑部信息通常难以获得、昂贵且稀缺。在本文中,我们提出了首个能够在仅使用个人易获取且常有记录的人口统计学、访谈和认知测试信息的情况下生成特定个体合成脑部 MRI 分割方法——具体来说是三维白质(WM)、灰质(GM)和脑脊液(CSF)分割的方法。我们的方法采用了一种新颖的深度生成模型,即 CSegSynth,该模型优于现有的条件变分自编码器(C-VAE)、条件生成对抗网络(C-GAN)和条件潜在扩散模型(C-LDM)。我们通过广泛的评估展示了合成分割的质量。此外,在评估个体特定生成的有效性时,我们的方法在体积预测方面表现出色,与测试个体的真实 WM、GM 和 CSF 体积分 别之间的皮尔逊相关系数达到了0.80、0.82和0.70。
https://arxiv.org/abs/2504.12352
This study investigates the generation of high-quality synthetic categorical data, such as survey data, using causal graph models. Generating synthetic data aims not only to create a variety of data for training the models but also to preserve privacy while capturing relationships between the data. The research employs Structural Equation Modeling (SEM) followed by Bayesian Networks (BN). We used the categorical data that are based on the survey of accessibility to services for people with disabilities. We created both SEM and BN models to represent causal relationships and to capture joint distributions between variables. In our case studies, such variables include, in particular, demographics, types of disability, types of accessibility barriers and frequencies of encountering those barriers. The study compared the SEM-based BN method with alternative approaches, including the probabilistic Gaussian copula technique and generative models like the Conditional Tabular Generative Adversarial Network (CTGAN). The proposed method outperformed others in statistical metrics, including the Chi-square test, Kullback-Leibler divergence, and Total Variation Distance (TVD). In particular, the BN model demonstrated superior performance, achieving the highest TVD, indicating alignment with the original data. The Gaussian Copula ranked second, while CTGAN exhibited moderate performance. These analyses confirmed the ability of the SEM-based BN to produce synthetic data that maintain statistical and relational validity while maintaining confidentiality. This approach is particularly beneficial for research on sensitive data, such as accessibility and disability studies.
这项研究探讨了使用因果图模型生成高质量的合成分类数据(如调查数据)的方法。生成合成数据不仅旨在创建用于训练模型的各种数据,还要在捕捉数据之间的关系的同时保护隐私。该研究采用了结构方程建模(SEM)然后是贝叶斯网络(BN)。我们基于对残疾人服务可及性的调查使用了分类数据,并创建了SEM和BN模型来表示因果关系并捕获变量间的联合分布。在我们的案例研究中,这些变量包括人口统计学、残疾类型、可及性障碍类型以及遇到这些障碍的频率等。 本研究将基于SEM的BN方法与替代方法进行了比较,包括概率高斯 copula 技术和生成模型(如条件表生成对抗网络CTGAN)。提出的方法在统计指标中表现优于其他方法,包括卡方检验、Kullback-Leibler 散度以及总变异距离(TVD)。 特别地,BN模型展示了最佳性能,达到了最高的TVD值,表明与原始数据高度一致。高斯 Copula 排名第二,而CTGAN则表现出中等水平的性能。这些分析确认了基于SEM的BN方法能够生成既保持统计和关系有效性又保护隐私的合成数据。这种方法对于研究敏感性数据(如无障碍环境和残疾人群体的研究)尤其有益。
https://arxiv.org/abs/2504.11547
Diffusion models have achieved outstanding image generation by reversing a forward noising process to approximate true data distributions. During training, these models predict diffusion scores from noised versions of true samples in a single forward pass, while inference requires iterative denoising starting from white noise. This training-inference divergences hinder the alignment between inference and training data distributions, due to potential prediction biases and cumulative error accumulation. To address this problem, we propose an intuitive but effective fine-tuning framework, called Adversarial Diffusion Tuning (ADT), by stimulating the inference process during optimization and aligning the final outputs with training data by adversarial supervision. Specifically, to achieve robust adversarial training, ADT features a siamese-network discriminator with a fixed pre-trained backbone and lightweight trainable parameters, incorporates an image-to-image sampling strategy to smooth discriminative difficulties, and preserves the original diffusion loss to prevent discriminator hacking. In addition, we carefully constrain the backward-flowing path for back-propagating gradients along the inference path without incurring memory overload or gradient explosion. Finally, extensive experiments on Stable Diffusion models (v1.5, XL, and v3), demonstrate that ADT significantly improves both distribution alignment and image quality.
扩散模型通过逆向处理正向加噪过程来实现出色的照片生成,以逼近真实数据分布。在训练过程中,这些模型会从真实样本的加噪版本中预测扩散分数,并且只需一次前向传递即可完成任务,而推理则需要从白噪声开始进行迭代去噪。这种训练和推断之间的差异阻碍了推断与训练数据分布之间的一致性,因为可能存在预测偏差以及累积误差的问题。为了解决这个问题,我们提出了一种直观但有效的微调框架,称为对抗扩散调整(ADT),通过在优化过程中刺激推理过程,并通过对抗监督使最终输出与训练数据对齐来解决此问题。 具体而言,为了实现稳健的对抗性训练,ADT具有一个采用固定预训练骨干和轻量级可学习参数的孪生网络判别器。它还结合了一种图像到图像采样策略以平滑区分难度,并保留了原始扩散损失以防止鉴别器被破解。此外,我们精心约束了反向流动路径,以便在推理路径上沿用梯度传播而不会导致内存过载或梯度爆炸。 最终,在Stable Diffusion模型(v1.5、XL和v3)上的广泛实验表明,ADT显著提高了分布对齐程度以及图像质量。
https://arxiv.org/abs/2504.11423
Cancer patients are increasingly turning to large language models (LLMs) as a new form of internet search for medical information, making it critical to assess how well these models handle complex, personalized questions. However, current medical benchmarks focus on medical exams or consumer-searched questions and do not evaluate LLMs on real patient questions with detailed clinical contexts. In this paper, we first evaluate LLMs on cancer-related questions drawn from real patients, reviewed by three hematology oncology physicians. While responses are generally accurate, with GPT-4-Turbo scoring 4.13 out of 5, the models frequently fail to recognize or address false presuppositions in the questions-posing risks to safe medical decision-making. To study this limitation systematically, we introduce Cancer-Myth, an expert-verified adversarial dataset of 585 cancer-related questions with false presuppositions. On this benchmark, no frontier LLM -- including GPT-4o, this http URL, and Claude-3.5-Sonnet -- corrects these false presuppositions more than 30% of the time. Even advanced medical agentic methods do not prevent LLMs from ignoring false presuppositions. These findings expose a critical gap in the clinical reliability of LLMs and underscore the need for more robust safeguards in medical AI systems.
癌症患者越来越多地转向大型语言模型(LLM)作为获取医疗信息的新形式的互联网搜索,这使得评估这些模型处理复杂且个性化问题的能力变得至关重要。然而,目前的医学基准测试主要关注医学考试或普通消费者查询的问题,并不评价LLM在面对真实病人详细临床背景下的提问时的表现。在这篇论文中,我们首先用由三位血液肿瘤科医生审核的真实癌症患者提出的相关问题来评估LLM。虽然响应总体上是准确的,GPT-4-Turbo得分达到5分中的4.13分,但这些模型经常未能识别或处理问题中存在的错误假设——这可能对安全医疗决策构成风险。 为了系统地研究这一局限性,我们引入了Cancer-Myth,这是一个包含585个带有错误假设的癌症相关问题的专家验证对抗数据集。在该基准测试上,没有前沿LLM——包括GPT-4o、Claude-3.5-Sonnet等——能在超过30%的情况下纠正这些错误假设。即使是先进的医学代理方法也无法阻止LLM忽视这些问题中的错误假设。 这些发现揭示了LLM临床可靠性的关键缺口,并强调需要在医疗AI系统中建立更强大的安全措施。
https://arxiv.org/abs/2504.11373
A Large Language Model (LLM) powered GUI agent is a specialized autonomous system that performs tasks on the user's behalf according to high-level instructions. It does so by perceiving and interpreting the graphical user interfaces (GUIs) of relevant apps, often visually, inferring necessary sequences of actions, and then interacting with GUIs by executing the actions such as clicking, typing, and tapping. To complete real-world tasks, such as filling forms or booking services, GUI agents often need to process and act on sensitive user data. However, this autonomy introduces new privacy and security risks. Adversaries can inject malicious content into the GUIs that alters agent behaviors or induces unintended disclosures of private information. These attacks often exploit the discrepancy between visual saliency for agents and human users, or the agent's limited ability to detect violations of contextual integrity in task automation. In this paper, we characterized six types of such attacks, and conducted an experimental study to test these attacks with six state-of-the-art GUI agents, 234 adversarial webpages, and 39 human participants. Our findings suggest that GUI agents are highly vulnerable, particularly to contextually embedded threats. Moreover, human users are also susceptible to many of these attacks, indicating that simple human oversight may not reliably prevent failures. This misalignment highlights the need for privacy-aware agent design. We propose practical defense strategies to inform the development of safer and more reliable GUI agents.
一个由大型语言模型(LLM)驱动的图形用户界面(GUI)代理是一种专门的自主系统,它根据高级指令代表用户执行任务。这通过感知和解释相关应用程序的图形用户界面来实现,通常以视觉方式完成,并推断必要的操作序列,然后通过执行如点击、输入和轻触等动作与GUI进行交互。为了完成诸如填写表格或预订服务之类的现实世界任务,GUI代理常常需要处理和采取行动基于敏感的用户数据。然而,这种自主性引入了新的隐私和安全风险。对手可以将恶意内容注入到GUI中以改变代理的行为或将私人信息意外泄露出去。这些攻击往往利用了对代理与人类用户之间视觉显著性的差异或在任务自动化过程中代理检测上下文完整性违规的能力有限。 在这篇论文中,我们定义并描述了六种此类攻击,并进行了实验研究来测试这六种最先进的GUI代理、234个对抗性网页和39名人类参与者。我们的发现表明,GUI代理特别容易受到嵌入在上下文中的威胁的影响。此外,许多这些攻击也会使普通用户受到影响,这意味着简单的手动监督可能无法可靠地防止此类问题的发生。这种不一致突显了设计具有隐私意识的代理的重要性。我们提出了一些实用的防御策略以指导开发更安全和可靠的GUI代理。 (注:这段翻译已经尽力保持原文的意思和结构完整性,并适当调整使其更加流畅易读。)
https://arxiv.org/abs/2504.11281
Vision-language models (VLMs), such as CLIP, have gained significant popularity as foundation models, with numerous fine-tuning methods developed to enhance performance on downstream tasks. However, due to their inherent vulnerability and the common practice of selecting from a limited set of open-source models, VLMs suffer from a higher risk of adversarial attacks than traditional vision models. Existing defense techniques typically rely on adversarial fine-tuning during training, which requires labeled data and lacks of flexibility for downstream tasks. To address these limitations, we propose robust test-time prompt tuning (R-TPT), which mitigates the impact of adversarial attacks during the inference stage. We first reformulate the classic marginal entropy objective by eliminating the term that introduces conflicts under adversarial conditions, retaining only the pointwise entropy minimization. Furthermore, we introduce a plug-and-play reliability-based weighted ensembling strategy, which aggregates useful information from reliable augmented views to strengthen the defense. R-TPT enhances defense against adversarial attacks without requiring labeled training data while offering high flexibility for inference tasks. Extensive experiments on widely used benchmarks with various attacks demonstrate the effectiveness of R-TPT. The code is available in this https URL.
视觉-语言模型(VLMs),如CLIP,作为基础模型获得了极大的流行,并且开发了许多微调方法来增强其在下游任务上的性能。然而,由于这些模型内在的脆弱性和从有限开源模型中选择的普遍做法,VLM们比传统视觉模型更容易遭受对抗性攻击。现有的防御技术通常依赖于训练过程中的对抗微调,这需要标注的数据,并且缺乏对下游任务的灵活性。为了解决这些问题,我们提出了一种稳健的测试时提示调整(R-TPT)方法,该方法在推理阶段减轻了对抗性攻击的影响。首先,我们将经典的边际熵目标重新表述为排除在对抗条件下引入冲突的项,仅保留点式熵最小化。此外,我们引入了一种基于可靠性的即插即用加权集成策略,这种策略通过聚合可靠的增强视图中的有用信息来加强防御能力。R-TPT能够在无需标注训练数据的情况下提高对对抗性攻击的防御效果,并为推理任务提供高度灵活性。广泛的实验在各种具有不同攻击手段的常用基准测试上证明了R-TPT的有效性。代码可在[此链接](https://this-url.com)获取。
https://arxiv.org/abs/2504.11195
The fusion of Large Language Models (LLMs) with recommender systems (RecSys) has dramatically advanced personalized recommendations and drawn extensive attention. Despite the impressive progress, the safety of LLM-based RecSys against backdoor attacks remains largely under-explored. In this paper, we raise a new problem: Can a backdoor with a specific trigger be injected into LLM-based Recsys, leading to the manipulation of the recommendation responses when the backdoor trigger is appended to an item's title? To investigate the vulnerabilities of LLM-based RecSys under backdoor attacks, we propose a new attack framework termed Backdoor Injection Poisoning for RecSys (BadRec). BadRec perturbs the items' titles with triggers and employs several fake users to interact with these items, effectively poisoning the training set and injecting backdoors into LLM-based RecSys. Comprehensive experiments reveal that poisoning just 1% of the training data with adversarial examples is sufficient to successfully implant backdoors, enabling manipulation of recommendations. To further mitigate such a security threat, we propose a universal defense strategy called Poison Scanner (P-Scanner). Specifically, we introduce an LLM-based poison scanner to detect the poisoned items by leveraging the powerful language understanding and rich knowledge of LLMs. A trigger augmentation agent is employed to generate diverse synthetic triggers to guide the poison scanner in learning domain-specific knowledge of the poisoned item detection task. Extensive experiments on three real-world datasets validate the effectiveness of the proposed P-Scanner.
大型语言模型(LLMs)与推荐系统(RecSys)的融合在个性化推荐方面取得了显著进步,并引起了广泛关注。尽管取得了令人瞩目的成就,但基于LLM的RecSys的安全性仍然没有得到充分探索,尤其是在抵御后门攻击方面。在这篇论文中,我们提出一个新的问题:是否可以向基于LLM的RecSys注入具有特定触发器的后门,在添加了该后门触发器的情况下,使推荐响应被操纵?为了研究基于LLM的RecSys在面对后门攻击时的安全漏洞,我们提出了一个新框架——用于推荐系统的后门注入中毒(Backdoor Injection Poisoning for RecSys,简称BadRec)。BadRec通过在项目标题中加入触发词,并利用多个虚假用户与这些项目互动来扰动数据集中的项目标题,从而有效地毒化训练集并在基于LLM的RecSys中植入后门。全面的实验表明,只需用对抗性样本毒化1%的训练数据就足以成功植入后门,从而使推荐系统的行为受到操控。 为了进一步缓解这种安全威胁,我们提出了一种通用防御策略——名为Poison Scanner(P-Scanner)的方法。具体而言,我们引入了基于LLM的中毒扫描器来检测被污染的项目,利用LLMs强大的语言理解和丰富的知识。此外,还部署了一个触发词增强代理以生成多样的合成触发词,以此引导Poison Scanner学习特定领域的中毒物品检测任务的知识。 在三个真实世界的数据集上进行的广泛实验验证了所提出的P-Scanner的有效性。
https://arxiv.org/abs/2504.11182
The growing demand for robots to operate effectively in diverse environments necessitates the need for robust real-time anomaly detection techniques during robotic operations. However, deep learning-based models in robotics face significant challenges due to limited training data and highly noisy signal features. In this paper, we present Sparse Masked Autoregressive Flow-based Adversarial AutoEncoders model to address these problems. This approach integrates Masked Autoregressive Flow model into Adversarial AutoEncoders to construct a flexible latent space and utilize Sparse autoencoder to efficiently focus on important features, even in scenarios with limited feature space. Our experiments demonstrate that the proposed model achieves a 4.96% to 9.75% higher area under the receiver operating characteristic curve for pick-and-place robotic operations with randomly placed cans, compared to existing state-of-the-art methods. Notably, it showed up to 19.67% better performance in scenarios involving collisions with lightweight objects. Additionally, unlike the existing state-of-the-art model, our model performs inferences within 1 millisecond, ensuring real-time anomaly detection. These capabilities make our model highly applicable to machine learning-based robotic safety systems in dynamic environments. The code will be made publicly available after acceptance.
对这段文本的翻译如下: 在各种环境中有效运行机器人的需求日益增长,这要求机器人操作过程中具备强大的实时异常检测技术。然而,基于深度学习的模型在机器人领域面临着由于训练数据有限和信号特征高度噪声而带来的重大挑战。本文提出了一种基于稀疏屏蔽自回归流(Sparse Masked Autoregressive Flow)与对抗自动编码器(Adversarial AutoEncoders)结合的方法来解决这些问题。该方法将屏蔽自回归流模型集成到对抗自动编码器中,构建了一个灵活的潜在空间,并利用稀疏自动编码器在特征空间有限的情况下也能有效聚焦于重要特征。实验结果表明,在执行随机放置罐子的抓取和放置任务时,所提出的模型相比现有最先进的方法实现了4.96%至9.75%更高的接收者操作特性曲线下的面积(AUC)提升。值得注意的是,在涉及与轻质物体碰撞的情景中,其性能最高提升了19.67%。此外,不同于现有的先进模型,我们的模型能够在不到1毫秒的时间内完成推理过程,从而确保了实时异常检测的能力。这些特点使得该模型在动态环境中的机器学习机器人安全系统中具有高度的适用性。待论文被接受后,代码将公开发布。
https://arxiv.org/abs/2504.11170
Large Language Models (LLMs) guardrail systems are designed to protect against prompt injection and jailbreak attacks. However, they remain vulnerable to evasion techniques. We demonstrate two approaches for bypassing LLM prompt injection and jailbreak detection systems via traditional character injection methods and algorithmic Adversarial Machine Learning (AML) evasion techniques. Through testing against six prominent protection systems, including Microsoft's Azure Prompt Shield and Meta's Prompt Guard, we show that both methods can be used to evade detection while maintaining adversarial utility achieving in some instances up to 100% evasion success. Furthermore, we demonstrate that adversaries can enhance Attack Success Rates (ASR) against black-box targets by leveraging word importance ranking computed by offline white-box models. Our findings reveal vulnerabilities within current LLM protection mechanisms and highlight the need for more robust guardrail systems.
大型语言模型(LLM)的防护系统旨在防范提示注入和越狱攻击。然而,它们仍然容易受到规避技术的影响。我们演示了两种通过传统字符注入方法和算法对抗机器学习(AML)规避技术来绕过LLM提示注入和越狱检测系统的途径。通过对包括微软Azure Prompt Shield和Meta Prompt Guard在内的六种主要防护系统进行测试,我们展示了这两种方法可以在一定程度上避开检测并保持对手的实用性,在某些情况下实现高达100%的成功规避。此外,我们还演示了攻击者可以通过利用离线白盒模型计算出的词重要性排名来提高对黑盒目标的攻击成功率(ASR)。我们的研究揭示了当前LLM保护机制中存在的漏洞,并强调了建立更强大的防护系统的需求。
https://arxiv.org/abs/2504.11168