Adversarial

AAPL: Adding Attributes to Prompt Learning for Vision-Language Models

2024-04-25 17:51:10

Gahyeon Kim, Sohee Kim, Seokju Lee

arXiv_AI

arXiv_AI Adversarial Embedding Language_Model Transformer Pose Few-Shot Zero-Shot
Abstract

Recent advances in large pre-trained vision-language models have demonstrated remarkable performance on zero-shot downstream tasks. Building upon this, recent studies, such as CoOp and CoCoOp, have proposed the use of prompt learning, where context within a prompt is replaced with learnable vectors, leading to significant improvements over manually crafted prompts. However, the performance improvement for unseen classes is still marginal, and to tackle this problem, data augmentation has been frequently used in traditional zero-shot learning techniques. Through our experiments, we have identified important issues in CoOp and CoCoOp: the context learned through traditional image augmentation is biased toward seen classes, negatively impacting generalization to unseen classes. To address this problem, we propose adversarial token embedding to disentangle low-level visual augmentation features from high-level class information when inducing bias in learnable prompts. Through our novel mechanism called "Adding Attributes to Prompt Learning", AAPL, we guide the learnable context to effectively extract text features by focusing on high-level features for unseen classes. We have conducted experiments across 11 datasets, and overall, AAPL shows favorable performances compared to the existing methods in few-shot learning, zero-shot learning, cross-dataset, and domain generalization tasks.

Abstract (translated)

近年来，大型预训练视觉语言模型在零散分布任务上的表现已经引人注目。在此基础上，一些研究，如CoOp和CoCoOp，提出了使用提示学习的方法，其中上下文在提示中替换为可学习向量，从而在手动设计的提示上取得了显著的改进。然而，对于未见过的类别的性能提升仍然很小，为了解决这个问题，传统零散学习技术中经常使用数据增强。通过我们的实验，我们发现了CoOp和CoCoOp中重要的问题：通过传统图像增强学习到的上下文存在偏见，不利于对未见过的类别的泛化。为了解决这个问题，我们提出了一个对抗性标记嵌入策略，当在提示中诱导偏见时，将低级视觉增强特征与高级分类信息分离。通过我们新颖的机制“在提示中添加属性”，AAPL，我们引导可学习上下文有效地提取未见过的类别的文本特征。我们在11个数据集上进行了实验，总体而言，AAPL在零散分布学习、少样本学习、跨数据集学习和领域泛化任务上的表现与现有方法相比具有优势。

URL

https://arxiv.org/abs/2404.16804

PDF

https://arxiv.org/pdf/2404.16804.pdf
Read All
A Self-Organizing Clustering System for Unsupervised Distribution Shift Detection

2024-04-25 14:48:29

Sebastián Basterrech, Line Clemmensen, Gerardo Rubino

arXiv_AI

arXiv_AI GAN Detection Adversarial Unsupervised Pose
Abstract

Modeling non-stationary data is a challenging problem in the field of continual learning, and data distribution shifts may result in negative consequences on the performance of a machine learning model. Classic learning tools are often vulnerable to perturbations of the input covariates, and are sensitive to outliers and noise, and some tools are based on rigid algebraic assumptions. Distribution shifts are frequently occurring due to changes in raw materials for production, seasonality, a different user base, or even adversarial attacks. Therefore, there is a need for more effective distribution shift detection techniques. In this work, we propose a continual learning framework for monitoring and detecting distribution changes. We explore the problem in a latent space generated by a bio-inspired self-organizing clustering and statistical aspects of the latent space. In particular, we investigate the projections made by two topology-preserving maps: the Self-Organizing Map and the Scale Invariant Map. Our method can be applied in both a supervised and an unsupervised context. We construct the assessment of changes in the data distribution as a comparison of Gaussian signals, making the proposed method fast and robust. We compare it to other unsupervised techniques, specifically Principal Component Analysis (PCA) and Kernel-PCA. Our comparison involves conducting experiments using sequences of images (based on MNIST and injected shifts with adversarial samples), chemical sensor measurements, and the environmental variable related to ozone levels. The empirical study reveals the potential of the proposed approach.

Abstract (translated)

建模非平稳数据是连续学习领域的一个具有挑战性的问题，数据分布的变化可能导致机器学习模型的性能下降。经典的 learning 工具通常对输入协变量的小扰动敏感，对异常值和噪声敏感，有些工具是基于刚性的代数假设。由于生产原材料的变化、季节性、不同的用户群或甚至恶意攻击等原因，数据分布的变化经常发生。因此，有必要开发更有效的分布变化检测技术。在这项工作中，我们提出了一个连续学习框架，用于监测和检测分布变化。我们在由生物启发的自组织聚类生成的潜在空间中研究这个问题。特别是，我们研究了两个保持拓扑不变的映射的投影：自组织映射和收缩不变映射。我们的方法可以在有监督和无监督两种情况下应用。我们对数据分布的变化进行评估，通过比较高斯信号，使所提出的方法快速且具有鲁棒性。我们将其与其它无监督技术（特别是主成分分析（PCA）和核聚类）进行比较。我们的比较包括使用图像序列（基于 MNIST 数据集并注入对抗样本）、化学传感器测量和与臭氧水平相关的环境变量进行的实验。实证研究揭示了所提出方法的优势。

URL

https://arxiv.org/abs/2404.16656

PDF

https://arxiv.org/pdf/2404.16656.pdf
Read All
PAD: Patch-Agnostic Defense against Adversarial Patch Attacks

2024-04-25 09:32:34

Lihua Jing, Rui Wang, Wenqi Ren, Xin Dong, Cong Zou

arXiv_CV

arXiv_CV Detection Object_Detection Adversarial Knowledge Pose Autonomous
Abstract

Adversarial patch attacks present a significant threat to real-world object detectors due to their practical feasibility. Existing defense methods, which rely on attack data or prior knowledge, struggle to effectively address a wide range of adversarial patches. In this paper, we show two inherent characteristics of adversarial patches, semantic independence and spatial heterogeneity, independent of their appearance, shape, size, quantity, and location. Semantic independence indicates that adversarial patches operate autonomously within their semantic context, while spatial heterogeneity manifests as distinct image quality of the patch area that differs from original clean image due to the independent generation process. Based on these observations, we propose PAD, a novel adversarial patch localization and removal method that does not require prior knowledge or additional training. PAD offers patch-agnostic defense against various adversarial patches, compatible with any pre-trained object detectors. Our comprehensive digital and physical experiments involving diverse patch types, such as localized noise, printable, and naturalistic patches, exhibit notable improvements over state-of-the-art works. Our code is available at this https URL.

Abstract (translated)

对抗性补丁攻击对现实世界的物体检测器构成了显著的安全威胁，因为它们的实际可行性。现有的防御方法，依赖攻击数据或先验知识，很难有效地应对广泛的对抗性补丁。在本文中，我们展示了对抗性补丁的两个固有特性：语义独立性和空间异质性，无论它们的形状、大小、数量和位置如何。语义独立性表明，攻击性补丁在语义上下文内自行为，而空间异质性表现为由于独立生成过程，补丁区域与原始干净图像的图像质量不同的显著图像质量差异。基于这些观察结果，我们提出了PAD，一种新颖的对抗性补丁局部化和删除方法，不需要先验知识或额外训练。PAD能够对各种对抗性补丁进行补丁，兼容任何预训练的物体检测器。我们对各种补丁类型（如局部噪音、可打印的和自然istic补丁）进行全面的数字和物理实验，结果表明，与最先进的成果相比，我们的工作取得了显著的改善。我们的代码可在此处访问：https://www.thuatminh.com/

URL

https://arxiv.org/abs/2404.16452

PDF

https://arxiv.org/pdf/2404.16452.pdf
Read All
Constructing Optimal Noise Channels for Enhanced Robustness in Quantum Machine Learning

2024-04-25 08:49:29

David Winderl, Nicola Franco, Jeanette Miriam Lorenz

arXiv_AI

arXiv_AI Adversarial
Abstract

With the rapid advancement of Quantum Machine Learning (QML), the critical need to enhance security measures against adversarial attacks and protect QML models becomes increasingly evident. In this work, we outline the connection between quantum noise channels and differential privacy (DP), by constructing a family of noise channels which are inherently $\epsilon$-DP: $(\alpha, \gamma)$-channels. Through this approach, we successfully replicate the $\epsilon$-DP bounds observed for depolarizing and random rotation channels, thereby affirming the broad generality of our framework. Additionally, we use a semi-definite program to construct an optimally robust channel. In a small-scale experimental evaluation, we demonstrate the benefits of using our optimal noise channel over depolarizing noise, particularly in enhancing adversarial accuracy. Moreover, we assess how the variables $\alpha$ and $\gamma$ affect the certifiable robustness and investigate how different encoding methods impact the classifier's robustness.

Abstract (translated)

在量子机器学习（QML）快速发展的背景下，增强对抗攻击的安全措施和保护QML模型的迫切需要变得越来越明显。在这项工作中，我们概述了量子噪声信道和差分隐私（DP）之间的联系，通过构建一个固有$\epsilon$-DP的噪声信道家族：($\alpha, \gamma$)信道。通过这种方法，我们成功复制了用于去偏振和随机旋转信道的$\epsilon$-DP界值，从而证实了我们框架的广泛适用性。此外，我们使用半定规划方法构建了最优鲁棒信道。在小型实验评估中，我们证明了使用我们最优的噪声信道比使用去偏振噪声的优势，尤其是在提高攻击准确性方面。此外，我们还研究了变量$\alpha$和$\gamma$如何影响可证明鲁棒性，以及不同编码方法如何影响分类器的鲁棒性。

URL

https://arxiv.org/abs/2404.16417

PDF

https://arxiv.org/pdf/2404.16417.pdf
Read All
Don't Say No: Jailbreaking LLM by Suppressing Refusal

2024-04-25 07:15:23

Yukai Zhou, Wenjie Wang

arXiv_CL

arXiv_CL Adversarial Inference Language_Model Pose Matching
Abstract

Ensuring the safety alignment of Large Language Models (LLMs) is crucial to generating responses consistent with human values. Despite their ability to recognize and avoid harmful queries, LLMs are vulnerable to "jailbreaking" attacks, where carefully crafted prompts elicit them to produce toxic content. One category of jailbreak attacks is reformulating the task as adversarial attacks by eliciting the LLM to generate an affirmative response. However, the typical attack in this category GCG has very limited attack success rate. In this study, to better study the jailbreak attack, we introduce the DSN (Don't Say No) attack, which prompts LLMs to not only generate affirmative responses but also novelly enhance the objective to suppress refusals. In addition, another challenge lies in jailbreak attacks is the evaluation, as it is difficult to directly and accurately assess the harmfulness of the attack. The existing evaluation such as refusal keyword matching has its own limitation as it reveals numerous false positive and false negative instances. To overcome this challenge, we propose an ensemble evaluation pipeline incorporating Natural Language Inference (NLI) contradiction assessment and two external LLM evaluators. Extensive experiments demonstrate the potency of the DSN and the effectiveness of ensemble evaluation compared to baseline methods.

Abstract (translated)

确保大型语言模型（LLMs）的安全对生成符合人类价值观的响应至关重要。尽管它们能够识别并避免有害查询，但LLMs仍然容易受到“破解”攻击，这种攻击是通过对LLM生成具有毒性内容的精心策划的提示来实现的。其中一种破解攻击是将任务重新建模为对抗性攻击，通过让LLM生成积极响应。然而，这种攻击类型的典型攻击成功率非常有限。在本研究中，为了更好地研究破解攻击，我们引入了DSN（不要说“不”）攻击，该攻击要求LLM不仅生成积极响应，而且还通过增强目标来抑制拒绝。此外，另一个挑战是破解攻击的评估，因为很难直接且准确地评估攻击的危害。现有的评估方法，如拒绝关键词匹配，本身也有其局限性，因为它揭示了大量的误判和误判实例。为了克服这个挑战，我们提出了一个包含自然语言推理（NLI）矛盾评估和两个外部LLM评估器的元学习评估管道。大量的实验证明，DSN和元学习的组合比基线方法更具有威力。

URL

https://arxiv.org/abs/2404.16369

PDF

https://arxiv.org/pdf/2404.16369.pdf
Read All
Boosting Model Resilience via Implicit Adversarial Data Augmentation

2024-04-25 03:22:48

Xiaoling Zhou, Wei Ye, Zhemg Lee, Rui Xie, Shikun Zhang

arXiv_CV

arXiv_CV Adversarial Optimization Pose
Abstract

Data augmentation plays a pivotal role in enhancing and diversifying training data. Nonetheless, consistently improving model performance in varied learning scenarios, especially those with inherent data biases, remains challenging. To address this, we propose to augment the deep features of samples by incorporating their adversarial and anti-adversarial perturbation distributions, enabling adaptive adjustment in the learning difficulty tailored to each sample's specific characteristics. We then theoretically reveal that our augmentation process approximates the optimization of a surrogate loss function as the number of augmented copies increases indefinitely. This insight leads us to develop a meta-learning-based framework for optimizing classifiers with this novel loss, introducing the effects of augmentation while bypassing the explicit augmentation process. We conduct extensive experiments across four common biased learning scenarios: long-tail learning, generalized long-tail learning, noisy label learning, and subpopulation shift learning. The empirical results demonstrate that our method consistently achieves state-of-the-art performance, highlighting its broad adaptability.

Abstract (translated)

数据增强在增强和丰富训练数据方面发挥着关键作用。然而，在各种学习场景中，特别是在具有固有数据偏见的学习场景中，持续提高模型性能仍然具有挑战性。为解决这个问题，我们提出了一种通过引入样本的对抗和抗对抗扰扰分布来增强其深度特征的方法，使得学习难度能针对每个样本的特定特征进行定制调整。然后我们理论性地揭示了，随着 augmented copy 数量的无限增加，优化 surrogate 损失函数的过程近似于无限接近。这个启示使我们开发了一个基于元学习优化类分类器的框架，在忽略显式增强过程的同时引入增强效果。我们在一个常见的偏见学习场景进行广泛的实验：长尾学习、一般长尾学习、噪音标签学习和亚聚类转移学习。实证结果表明，我们的方法 consistently实现了最先进的表现，突出了其广泛的适应性。

URL

https://arxiv.org/abs/2404.16307

PDF

https://arxiv.org/pdf/2404.16307.pdf
Read All
An Analysis of Recent Advances in Deepfake Image Detection in an Evolving Threat Landscape

2024-04-24 21:21:50

Sifat Muhammad Abdullah, Aravind Cheruvu, Shravya Kanchi, Taejoong Chung, Peng Gao, Murtuza Jadliwala, Bimal Viswanath

arXiv_CV

arXiv_CV Detection Object_Detection Adversarial Face Pose
Abstract

Deepfake or synthetic images produced using deep generative models pose serious risks to online platforms. This has triggered several research efforts to accurately detect deepfake images, achieving excellent performance on publicly available deepfake datasets. In this work, we study 8 state-of-the-art detectors and argue that they are far from being ready for deployment due to two recent developments. First, the emergence of lightweight methods to customize large generative models, can enable an attacker to create many customized generators (to create deepfakes), thereby substantially increasing the threat surface. We show that existing defenses fail to generalize well to such \emph{user-customized generative models} that are publicly available today. We discuss new machine learning approaches based on content-agnostic features, and ensemble modeling to improve generalization performance against user-customized models. Second, the emergence of \textit{vision foundation models} -- machine learning models trained on broad data that can be easily adapted to several downstream tasks -- can be misused by attackers to craft adversarial deepfakes that can evade existing defenses. We propose a simple adversarial attack that leverages existing foundation models to craft adversarial samples \textit{without adding any adversarial noise}, through careful semantic manipulation of the image content. We highlight the vulnerabilities of several defenses against our attack, and explore directions leveraging advanced foundation models and adversarial training to defend against this new threat.

Abstract (translated)

利用深度生成模型产生的 Deepfake 或合成图像对在线平台造成了严重威胁。这引发了多项研究努力，以准确检测 Deepfake 图像，并在公开的 Deepfake 数据集上取得优异性能。在这项工作中，我们研究了 8 项最先进的检测器，并认为它们距离部署还有很长的路要走，因为有两个最近的发展。首先，出现了一种轻量化的方法来定制大型生成模型，攻击者可以创建许多自定义生成器（创建 Deepfakes），从而大大增加威胁表面。我们证明了现有的防御措施对这种公开可用的用户自定义生成模型效果不佳。我们讨论了基于内容无关特征的新机器学习方法以及集成建模以提高对抗用户自定义模型的性能。其次，出现了一种可以被攻击者用于制作能够逃避现有防御措施的 adversarial Deepfakes 的机器学习模型，即 vision foundation models。我们提出了一个简单的 adversarial 攻击，该攻击利用现有的 foundation 模型制作 adversarial 样本，并通过仔细的语义操作来操纵图像内容。我们重点讨论了多项防御措施如何对抗我们的攻击，并探讨了利用先进的 foundation 模型和 adversarial 训练来抵御这种新威胁的方向。

URL

https://arxiv.org/abs/2404.16212

PDF

https://arxiv.org/pdf/2404.16212.pdf
Read All
The Trembling-Hand Problem for LTLf Planning

2024-04-24 19:38:56

Pian Yu, Shufang Zhu, Giuseppe De Giacomo, Marta Kwiatkowska, Moshe Vardi

arXiv_RO

arXiv_RO Adversarial Pose Action Agent
Abstract

Consider an agent acting to achieve its temporal goal, but with a "trembling hand". In this case, the agent may mistakenly instruct, with a certain (typically small) probability, actions that are not intended due to faults or imprecision in its action selection mechanism, thereby leading to possible goal failure. We study the trembling-hand problem in the context of reasoning about actions and planning for temporally extended goals expressed in Linear Temporal Logic on finite traces (LTLf), where we want to synthesize a strategy (aka plan) that maximizes the probability of satisfying the LTLf goal in spite of the trembling hand. We consider both deterministic and nondeterministic (adversarial) domains. We propose solution techniques for both cases by relying respectively on Markov Decision Processes and on Markov Decision Processes with Set-valued Transitions with LTLf objectives, where the set-valued probabilistic transitions capture both the nondeterminism from the environment and the possible action instruction errors from the agent. We formally show the correctness of our solution techniques and demonstrate their effectiveness experimentally through a proof-of-concept implementation.

Abstract (translated)

考虑一个动作来实现其时间目标，但带着“颤抖的手”。在这种情况下，代理商可能会错误地指定一定概率的非意图行动，由于其动作选择机制的故障或粗略而导致的，从而导致可能的目标失败。我们在有限痕迹（LTLf）下对动作进行推理和规划的背景下研究颤抖手问题，我们试图合成一个策略（即计划），使其最大概率地满足LTLf目标，即使代理商犯错误。我们研究了确定性和非确定性（对抗）域。我们分别依赖马尔可夫决策过程和具有LTLf目标的可设值转移的马尔可夫决策过程来提出解决方案，其中集合值概率转移捕捉了环境中的不确定性和代理商的可能的指令错误。我们通过形式化证明展示了我们解决方案的正确性，并通过实验验证了它们的有效性。

URL

https://arxiv.org/abs/2404.16163

PDF

https://arxiv.org/pdf/2404.16163.pdf
Read All
A Survey on Intermediate Fusion Methods for Collaborative Perception Categorized by Real World Challenges

2024-04-24 18:57:30

Melih Yazgan, Thomas Graf, Min Liu, J. Marius Zoellner

arXiv_CV

arXiv_CV Survey Adversarial Autonomous
Abstract

This survey analyzes intermediate fusion methods in collaborative perception for autonomous driving, categorized by real-world challenges. We examine various methods, detailing their features and the evaluation metrics they employ. The focus is on addressing challenges like transmission efficiency, localization errors, communication disruptions, and heterogeneity. Moreover, we explore strategies to counter adversarial attacks and defenses, as well as approaches to adapt to domain shifts. The objective is to present an overview of how intermediate fusion methods effectively meet these diverse challenges, highlighting their role in advancing the field of collaborative perception in autonomous driving.

Abstract (translated)

这份调查对自动驾驶中协作感知的中间融合方法进行了分析，按照现实世界的挑战进行分类。我们检查了各种方法，详细介绍了它们的特征以及它们采用的评估指标。重点在于解决诸如传输效率、定位误差、通信干扰和异质性等问题。此外，我们还探讨了应对对抗攻击和防御策略以及适应领域转移的方法。调查的目的是提供一个概述，表明中间融合方法如何有效应对这些多样挑战，突出它们在自动驾驶领域协作感知中的作用。

URL

https://arxiv.org/abs/2404.16139

PDF

https://arxiv.org/pdf/2404.16139.pdf
Read All
Quantitative Characterization of Retinal Features in Translated OCTA

2024-04-24 18:40:45

Rashadul Hasan Badhon, Atalie Carina Thompson, Jennifer I. Lim, Theodore Leng, Minhaj Nur Alam

arXiv_CV

arXiv_CV Segmentation Detection Adversarial Quantitative Pose
Abstract

Purpose: This study explores the feasibility of using generative machine learning (ML) to translate Optical Coherence Tomography (OCT) images into Optical Coherence Tomography Angiography (OCTA) images, potentially bypassing the need for specialized OCTA hardware. Methods: The method involved implementing a generative adversarial network framework that includes a 2D vascular segmentation model and a 2D OCTA image translation model. The study utilizes a public dataset of 500 patients, divided into subsets based on resolution and disease status, to validate the quality of TR-OCTA images. The validation employs several quality and quantitative metrics to compare the translated images with ground truth OCTAs (GT-OCTA). We then quantitatively characterize vascular features generated in TR-OCTAs with GT-OCTAs to assess the feasibility of using TR-OCTA for objective disease diagnosis. Result: TR-OCTAs showed high image quality in both 3 and 6 mm datasets (high-resolution, moderate structural similarity and contrast quality compared to GT-OCTAs). There were slight discrepancies in vascular metrics, especially in diseased patients. Blood vessel features like tortuosity and vessel perimeter index showed a better trend compared to density features which are affected by local vascular distortions. Conclusion: This study presents a promising solution to the limitations of OCTA adoption in clinical practice by using vascular features from TR-OCTA for disease detection. Translation relevance: This study has the potential to significantly enhance the diagnostic process for retinal diseases by making detailed vascular imaging more widely available and reducing dependency on costly OCTA equipment.

Abstract (translated)

目的：本研究探讨了使用生成机器学习（ML）将光学相干断层扫描（OCT）图像转换为光学相干断层扫描血管造影（OCTA）图像的可行性，可能绕过需要专用OCTA硬件的需求。方法：本研究采用了一个包括2D血管分割模型和2D OCTA图像翻译模型的生成对抗网络框架。研究利用了一个包含500名患者的公共数据集，根据分辨率和疾病状态将数据集划分为子集，以验证TR-OCTA图像的质量。验证采用多个质量和数量指标来比较转换后的图像与真实OCTA（GT-OCTA）之间的差异。然后，通过定量分析TR-OCTA中产生的血管特征与GT-OCTA之间的差异，评估了使用TR-OCTA进行客观疾病诊断的可行性。结果：在3毫米和6毫米数据集上，TR-OCTA显示出与GT-OCTA相同的高图像质量（高分辨率、中等结构相似度和对比质量与GT-OCTA相比）。在病患者中，血管指标略有差异。像曲折度和血管周长指数这样的血管特征显示出更好的趋势，而受到局部血管变形影响的密度特征呈下降趋势。结论：本研究通过利用TR-OCTA中的血管特征来解决OCTA在临床实践中的局限性，为眼科疾病的诊断提供了一个有前景的解决方案。翻译意义：通过使详细血管成像更加广泛可用，并减少对昂贵OCTA设备的依赖，本研究有可能显著提高眼科疾病的诊断过程。

URL

https://arxiv.org/abs/2404.16133

PDF

https://arxiv.org/pdf/2404.16133.pdf
Read All
Universal Adversarial Triggers Are Not Universal

2024-04-24 17:53:14

Nicholas Meade, Arkil Patel, Siva Reddy

arXiv_CL

arXiv_CL Adversarial Face Optimization Language_Model
Abstract

Recent work has developed optimization procedures to find token sequences, called adversarial triggers, which can elicit unsafe responses from aligned language models. These triggers are believed to be universally transferable, i.e., a trigger optimized on one model can jailbreak other models. In this paper, we concretely show that such adversarial triggers are not universal. We extensively investigate trigger transfer amongst 13 open models and observe inconsistent transfer. Our experiments further reveal a significant difference in robustness to adversarial triggers between models Aligned by Preference Optimization (APO) and models Aligned by Fine-Tuning (AFT). We find that APO models are extremely hard to jailbreak even when the trigger is optimized directly on the model. On the other hand, while AFT models may appear safe on the surface, exhibiting refusals to a range of unsafe instructions, we show that they are highly susceptible to adversarial triggers. Lastly, we observe that most triggers optimized on AFT models also generalize to new unsafe instructions from five diverse domains, further emphasizing their vulnerability. Overall, our work highlights the need for more comprehensive safety evaluations for aligned language models.

Abstract (translated)

近年来，研究者们开发了寻找令牌序列的优化方法，称为对抗性触发器，这些触发器可以从对齐的语言模型中引起不安全的反应。这些触发器被认为具有普遍可转移性，即在一种模型上优化的触发器可以解锁其他模型。在本文中，我们明确地证明了这种普遍可转移的对抗性触发器并不存在。我们深入研究了13个开源模型之间的触发器传递，并观察到不一致的传递。我们的实验进一步揭示了使用偏好优化（APO）模型和 Fine-Tuning（FT）模型对 adversarial 触发器的鲁棒性差异。我们发现，即使 APO 模型直接优化触发器，也很难被破解。另一方面，虽然 AFT 模型在表面上看起来非常安全，对各种不安全的指令表现出拒绝，但我们发现它们对 adversarial 触发器非常敏感。最后，我们观察到，大多数在 AFT 模型上优化的触发器也适用于来自五个不同领域的全新不安全指令，这进一步突显了它们的脆弱性。总体而言，我们的工作强调了对于对齐语言模型的更全面的安全性评估的必要性。

URL

https://arxiv.org/abs/2404.16020

PDF

https://arxiv.org/pdf/2404.16020.pdf
Read All
HDDGAN: A Heterogeneous Dual-Discriminator Generative Adversarial Network for Infrared and Visible Image Fusion

2024-04-24 17:06:52

Guosheng Lu, Zile Fang, Chunming He, Zhigang Zhao

arXiv_CV

arXiv_CV GAN Recognition Adversarial Face Attention Salient Pose Action Surveillance
Abstract

Infrared and visible image fusion (IVIF) aims to preserve thermal radiation information from infrared images while integrating texture details from visible images, enabling the capture of important features and hidden details of subjects in complex scenes and disturbed environments. Consequently, IVIF offers distinct advantages in practical applications such as video surveillance, night navigation, and target recognition. However, prevailing methods often face challenges in simultaneously capturing thermal region features and detailed information due to the disparate characteristics of infrared and visible images. Consequently, fusion outcomes frequently entail a compromise between thermal target area information and texture details. In this study, we introduce a novel heterogeneous dual-discriminator generative adversarial network (HDDGAN) to address this issue. Specifically, the generator is structured as a multi-scale skip-connected structure, facilitating the extraction of essential features from different source images. To enhance the information representation ability of the fusion result, an attention mechanism is employed to construct the information fusion layer within the generator, leveraging the disparities between the source images. Moreover, recognizing the distinct learning requirements of information in infrared and visible images, we design two discriminators with differing structures. This approach aims to guide the model to learn salient information from infrared images while simultaneously capturing detailed information from visible images. Extensive experiments conducted on various public datasets demonstrate the superiority of our proposed HDDGAN over other state-of-the-art (SOTA) algorithms, highlighting its enhanced potential for practical applications.

Abstract (translated)

红外和可见图像融合（IVIF）旨在保留红外图像的热辐射信息，同时整合可见图像的纹理细节，从而使捕捉复杂场景和受干扰环境中的主题重要特征和隐藏细节成为可能。因此，在实际应用中，例如视频监控、夜间导航和目标识别，IVIF具有显著的优势。然而，由于红外和可见图像的差异特征，现有的方法在同时捕捉热区域特征和详细信息方面常常面临挑战。因此，融合结果通常需要在热目标区域信息与纹理细节之间做出权衡。在这项研究中，我们引入了一种新颖的异质双判别器生成对抗网络（HDDGAN）来解决这一问题。具体来说，生成器采用多尺度跳转连接结构，促进从不同源图像中提取关键特征。为了增强融合结果的信息表示能力，采用关注机制在生成器中构建信息融合层，利用源图像之间的差异。此外，考虑到红外和可见图像之间的不同学习需求，我们设计了两部分结构不同的判别器。这种方法旨在指导模型从红外图像中学习显著信息，同时从可见图像中捕捉详细信息。在各种公开数据集上进行的大量实验证明，与最先进的（SOTA）算法相比，我们提出的HDDGAN具有卓越的实用性能，强调了其在实际应用中的潜在优势。

URL

https://arxiv.org/abs/2404.15992

PDF

https://arxiv.org/pdf/2404.15992.pdf
Read All
Steal Now and Attack Later: Evaluating Robustness of Object Detection against Black-box Adversarial Attacks

2024-04-24 13:51:56

Erh-Chung Chen, Pin-Yu Chen, I-Hsin Chung, Che-Rung Lee

arXiv_CV

arXiv_CV Detection Object_Detection Adversarial Inference Knowledge Pose
Abstract

Latency attacks against object detection represent a variant of adversarial attacks that aim to inflate the inference time by generating additional ghost objects in a target image. However, generating ghost objects in the black-box scenario remains a challenge since information about these unqualified objects remains opaque. In this study, we demonstrate the feasibility of generating ghost objects in adversarial examples by extending the concept of "steal now, decrypt later" attacks. These adversarial examples, once produced, can be employed to exploit potential vulnerabilities in the AI service, giving rise to significant security concerns. The experimental results demonstrate that the proposed attack achieves successful attacks across various commonly used models and Google Vision API without any prior knowledge about the target model. Additionally, the average cost of each attack is less than \$ 1 dollars, posing a significant threat to AI security.

Abstract (translated)

延迟攻击针对目标检测是一种旨在通过在目标图像中生成额外幽灵对象来增加推理时间的对抗性攻击。然而，在黑盒场景中生成幽灵对象仍然是一个挑战，因为关于这些不合格对象的更多信息仍然是不可见的。在这项研究中，我们通过扩展“偷个不停，解密 later”攻击的概念，证明了在对抗性例子中生成幽灵对象是可能的。这些攻击性例子在生产后可以用于利用人工智能服务中的潜在漏洞，导致严重的安全问题。实验结果表明，与目标模型无关，所提出的攻击在各种常用模型和 Google Vision API 上都实现了成功的攻击。此外，每种攻击的平均成本不到 1 美元，对人工智能安全构成了重大威胁。

URL

https://arxiv.org/abs/2404.15881

PDF

https://arxiv.org/pdf/2404.15881.pdf
Read All
Vision Transformer-based Adversarial Domain Adaptation

2024-04-24 11:41:28

Yahan Li, Yuan Wu

arXiv_CV

arXiv_CV Segmentation Semantic_Segmentation CNN Detection Object_Detection Adversarial Classification Image_Classification Attention Knowledge Transformer Unsupervised
Abstract

Unsupervised domain adaptation (UDA) aims to transfer knowledge from a labeled source domain to an unlabeled target domain. The most recent UDA methods always resort to adversarial training to yield state-of-the-art results and a dominant number of existing UDA methods employ convolutional neural networks (CNNs) as feature extractors to learn domain invariant features. Vision transformer (ViT) has attracted tremendous attention since its emergence and has been widely used in various computer vision tasks, such as image classification, object detection, and semantic segmentation, yet its potential in adversarial domain adaptation has never been investigated. In this paper, we fill this gap by employing the ViT as the feature extractor in adversarial domain adaptation. Moreover, we empirically demonstrate that ViT can be a plug-and-play component in adversarial domain adaptation, which means directly replacing the CNN-based feature extractor in existing UDA methods with the ViT-based feature extractor can easily obtain performance improvement. The code is available at this https URL.

Abstract (translated)

无监督领域适应（UDA）的目的是将来自标注源域的知识转移到未标注的目标域。最最新的UDA方法总是依赖于对抗性训练以获得最先进的成果和主导数量现有的UDA方法采用卷积神经网络（CNN）作为特征提取器来学习域不变的特征。自Vision Transformer（ViT） emergence以来，已经引起了巨大的关注，并在各种计算机视觉任务中得到了广泛应用，然而其在对抗领域适应性的潜在能量从未被研究。在本文中，我们通过将ViT作为对抗领域适应的特征提取器来填补这一空白。此外，我们通过实验实证证明，ViT可以成为对抗领域适应的一个插件，这意味着在现有UDA方法中，将基于CNN的特征提取器直接替换为ViT的特征提取器可以轻松获得性能提升。代码可在此处下载：https://url.com/

URL

https://arxiv.org/abs/2404.15817

PDF

https://arxiv.org/pdf/2404.15817.pdf
Read All
Toward Physics-Aware Deep Learning Architectures for LiDAR Intensity Simulation

2024-04-24 09:52:36

Vivek Anand, Bharat Lohani, Gaurav Pandey, Rakesh Mishra

arXiv_CV

arXiv_CV GAN CNN Deep_Learning Adversarial Prediction Pose Autonomous Action
Abstract

Autonomous vehicles (AVs) heavily rely on LiDAR perception for environment understanding and navigation. LiDAR intensity provides valuable information about the reflected laser signals and plays a crucial role in enhancing the perception capabilities of AVs. However, accurately simulating LiDAR intensity remains a challenge due to the unavailability of material properties of the objects in the environment, and complex interactions between the laser beam and the environment. The proposed method aims to improve the accuracy of intensity simulation by incorporating physics-based modalities within the deep learning framework. One of the key entities that captures the interaction between the laser beam and the objects is the angle of incidence. In this work we demonstrate that the addition of the LiDAR incidence angle as a separate input to the deep neural networks significantly enhances the results. We present a comparative study between two prominent deep learning architectures: U-NET a Convolutional Neural Network (CNN), and Pix2Pix a Generative Adversarial Network (GAN). We implemented these two architectures for the intensity prediction task and used SemanticKITTI and VoxelScape datasets for experiments. The comparative analysis reveals that both architectures benefit from the incidence angle as an additional input. Moreover, the Pix2Pix architecture outperforms U-NET, especially when the incidence angle is incorporated.

Abstract (translated)

自动驾驶车辆（AVs）对环境理解和导航重度依赖激光雷达感知。激光雷达强度提供了关于反射激光信号的有价值的信息，并在增强AV的感知能力中发挥了关键作用。然而，准确模拟激光雷达强度仍然是一个挑战，由于环境中物体的材料性质不可用，以及激光束与环境的复杂相互作用。所提出的方法旨在通过在深度学习框架中引入基于物理的模态来提高强度模拟的准确性。一个捕捉激光束与物体之间互动的关键实体是入射角。在本文中，我们证明了将激光雷达入射角作为额外的输入到深度神经网络可以显著增强结果。我们比较了两个著名的深度学习架构：U-NET和Pix2Pix。我们将这两个架构用于强度预测任务，并使用SemanticKITTI和VoxelScape数据集进行实验。比较分析揭示了，这两个架构都从入射角作为额外的输入受益。此外，Pix2Pix架构在纳入入射角时优于U-NET。

URL

https://arxiv.org/abs/2404.15774

PDF

https://arxiv.org/pdf/2404.15774.pdf
Read All
A General Black-box Adversarial Attack on Graph-based Fake News Detectors

2024-04-24 09:04:05

Peican Zhu, Zechen Pan, Yang Liu, Jiwei Tian, Keke Tang, Zhen Wang

arXiv_AI

arXiv_AI Detection Object_Detection Adversarial Classification Embedding Relation Pose Action
Abstract

Graph Neural Network (GNN)-based fake news detectors apply various methods to construct graphs, aiming to learn distinctive news embeddings for classification. Since the construction details are unknown for attackers in a black-box scenario, it is unrealistic to conduct the classical adversarial attacks that require a specific adjacency matrix. In this paper, we propose the first general black-box adversarial attack framework, i.e., General Attack via Fake Social Interaction (GAFSI), against detectors based on different graph structures. Specifically, as sharing is an important social interaction for GNN-based fake news detectors to construct the graph, we simulate sharing behaviors to fool the detectors. Firstly, we propose a fraudster selection module to select engaged users leveraging local and global information. In addition, a post injection module guides the selected users to create shared relations by sending posts. The sharing records will be added to the social context, leading to a general attack against different detectors. Experimental results on empirical datasets demonstrate the effectiveness of GAFSI.

Abstract (translated)

基于图神经网络（GNN）的假新闻检测器应用各种方法来构建图，旨在学习分类新闻的显著特征。由于攻击者在黑盒场景中的构建细节是未知的，因此无法进行需要特定邻接矩阵的经典对抗攻击。在本文中，我们提出了第一个针对不同图结构的检测器的一般黑盒攻击框架，即通过虚假社交交互（GAFSI）进行攻击。具体来说，共享对于基于图神经网络的假新闻检测器构建图形至关重要。为了欺骗检测器，我们提出了一个欺诈者选择模块，利用本地和全局信息选择积极参与的用户。此外，一个后注入模块通过发送帖子指导选择的用户创建共享关系。共享记录将添加到社交上下文，导致对不同检测器的通用攻击。在经验数据集上的实验结果表明，GAFSI的有效性得到了充分验证。

URL

https://arxiv.org/abs/2404.15744

PDF

https://arxiv.org/pdf/2404.15744.pdf
Read All
SRAGAN: Saliency Regularized and Attended Generative Adversarial Network for Chinese Ink-wash Painting Generation

2024-04-24 09:02:24

Xiang Gao, Yuqi Zhang

arXiv_CV

arXiv_CV GAN Style_Transfer Detection Regularization Adversarial Attention Salient Quantitative Pose
Abstract

This paper handles the problem of converting real pictures into traditional Chinese ink-wash paintings, i.e., Chinese ink-wash painting style transfer. Though this problem could be realized by a wide range of image-to-image translation models, a notable issue with all these methods is that the original image content details could be easily erased or corrupted due to transfer of ink-wash style elements. To solve or ameliorate this issue, we propose to incorporate saliency detection into the unpaired image-to-image translation framework to regularize content information of the generated paintings. The saliency map is utilized for content regularization from two aspects, both explicitly and implicitly: (\romannumeral1) we propose saliency IOU (SIOU) loss to explicitly regularize saliency consistency before and after stylization; (\romannumeral2) we propose saliency adaptive normalization (SANorm) which implicitly enhances content integrity of the generated paintings by injecting saliency information to the generator network to guide painting generation. Besides, we also propose saliency attended discriminator network which harnesses saliency mask to focus generative adversarial attention onto salient image regions, it contributes to producing finer ink-wash stylization effect for salient objects of images. Qualitative and quantitative experiments consistently demonstrate superiority of our model over related advanced methods for Chinese ink-wash painting style transfer.

Abstract (translated)

本文处理将真实图像转换成传统中国水墨画的问题，即水墨画风格的图像-to-图像转换问题。尽管这个问题可以通过广泛的图像到图像转换模型来解决，但所有这些方法的一个显着的问题是，由于墨水风格元素的转移，原始图像内容细节很容易被轻松地擦除或损坏。为解决这个问题或减轻这个问题，我们提出将局部可用性检测引入无配对图像到图像转换框架中，以规范化生成的绘画的内容信息。局部可用性图用于从两个方面进行内容规范化：（罗马数字1）我们提出局部可用性IOU（SIOU）损失，以明确规范化和 stylization 之前的和之后的 saliency 一致性；（罗马数字2）我们提出局部可用性自适应规范化（SANorm），它通过向生成网络注入 saliency 信息来增强生成的绘画的内容完整性，从而指导绘画生成。此外，我们还提出了 saliency 注意的判别器网络，它利用 saliency 掩码将生成对抗注意力聚焦于显眼的图像区域，有助于产生更清晰的墨水画风格效果。定性和定量的实验结果都一致地证明了我们的模型在相关先进方法中的优越性。

URL

https://arxiv.org/abs/2404.15743

PDF

https://arxiv.org/pdf/2404.15743.pdf
Read All
Efficient Multi-Model Fusion with Adversarial Complementary Representation Learning

2024-04-24 07:47:55

Zuheng Kang, Yayun He, Jianzong Wang, Junqing Peng, Jing Xiao

arXiv_SD

arXiv_SD Adversarial Classification Image_Classification Represenation_Learning Knowledge Pose
Abstract

Single-model systems often suffer from deficiencies in tasks such as speaker verification (SV) and image classification, relying heavily on partial prior knowledge during decision-making, resulting in suboptimal performance. Although multi-model fusion (MMF) can mitigate some of these issues, redundancy in learned representations may limits improvements. To this end, we propose an adversarial complementary representation learning (ACoRL) framework that enables newly trained models to avoid previously acquired knowledge, allowing each individual component model to learn maximally distinct, complementary representations. We make three detailed explanations of why this works and experimental results demonstrate that our method more efficiently improves performance compared to traditional MMF. Furthermore, attribution analysis validates the model trained under ACoRL acquires more complementary knowledge, highlighting the efficacy of our approach in enhancing efficiency and robustness across tasks.

Abstract (translated)

单模型系统通常在诸如演讲验证（SV）和图像分类等任务中存在不足，因此在决策过程中严重依赖先验知识，导致性能较低。尽管多模型融合（MMF）可以在一定程度上减轻这些问题，但学习到的表示的冗余可能限制了提高。为此，我们提出了一个对抗性互补表示学习（ACoRL）框架，使新训练的模型能够避免之前获得的知识，使得每个组件模型能够学习到最独特的互补表示。我们详细解释了这种方法的工作原理，并进行了实验验证，表明与传统MMF相比，我们的方法能更有效地提高性能。此外，归因分析证实，在ACoRL框架下训练的模型获得了更多的互补知识，这表明我们的方法在提高任务效率和鲁棒性方面具有有效性。

URL

https://arxiv.org/abs/2404.15704

PDF

https://arxiv.org/pdf/2404.15704.pdf
Read All
Rethinking LLM Memorization through the Lens of Adversarial Compression

2024-04-23 15:49:37

Avi Schwarzschild, Zhili Feng, Pratyush Maini, Zachary C. Lipton, J. Zico Kolter

arXiv_CL

arXiv_CL Adversarial Language_Model Pose
Abstract

Large language models (LLMs) trained on web-scale datasets raise substantial concerns regarding permissible data usage. One major question is whether these models "memorize" all their training data or they integrate many data sources in some way more akin to how a human would learn and synthesize information. The answer hinges, to a large degree, on $\textit{how we define memorization}$. In this work, we propose the Adversarial Compression Ratio (ACR) as a metric for assessing memorization in LLMs -- a given string from the training data is considered memorized if it can be elicited by a prompt shorter than the string itself. In other words, these strings can be "compressed" with the model by computing adversarial prompts of fewer tokens. We outline the limitations of existing notions of memorization and show how the ACR overcomes these challenges by (i) offering an adversarial view to measuring memorization, especially for monitoring unlearning and compliance; and (ii) allowing for the flexibility to measure memorization for arbitrary strings at a reasonably low compute. Our definition serves as a valuable and practical tool for determining when model owners may be violating terms around data usage, providing a potential legal tool and a critical lens through which to address such scenarios. Project page: this https URL.

Abstract (translated)

大规模语言模型（LLMs）在面向互联网数据集的训练中引发了很多关于数据使用的合理性问题。一个主要问题是这些模型是否“记忆”所有的训练数据，或者它们以某种更类似于人类学习和合成信息的方式来整合多个数据源。答案在很大程度上取决于我们如何定义“记忆”。在这篇工作中，我们提出了 Adversarial Compression Ratio（ACR）作为评估 LLM 记忆的一个指标。给定训练数据中的一个字符串被认为是记忆的，如果可以用一个比该字符串本身更短的提示来激发。换句话说，这些字符串可以通过计算 adversarial 提示的数量来“压缩”模型。我们概述了现有关于记忆的定义的局限性，并展示了 ACR 如何通过（i）提供对测量记忆的 adversarial 视角，特别是对于监控解学习和合规；（ii）允许在合理的低计算成本下测量任意字符串的记忆来克服这些挑战。我们的定义作为判断模型所有者是否违反数据使用条款的有价值且实用的工具是宝贵的，并且为我们提供了一个法律工具和一个关键的视角来解决这些问题。项目页面：此链接：<https://this.url>。

URL

https://arxiv.org/abs/2404.15146

PDF

https://arxiv.org/pdf/2404.15146.pdf
Read All
Perturbing Attention Gives You More Bang for the Buck: Subtle Imaging Perturbations That Efficiently Fool Customized Diffusion Models

2024-04-23 14:31:15

Jingyao Xu, Yuetong Lu, Yandong Li, Siyang Lu, Dongdong Wang, Xiang Wei

arXiv_CV

arXiv_CV Adversarial Attention Pose Diffusion
Abstract

Diffusion models (DMs) embark a new era of generative modeling and offer more opportunities for efficient generating high-quality and realistic data samples. However, their widespread use has also brought forth new challenges in model security, which motivates the creation of more effective adversarial attackers on DMs to understand its vulnerability. We propose CAAT, a simple but generic and efficient approach that does not require costly training to effectively fool latent diffusion models (LDMs). The approach is based on the observation that cross-attention layers exhibits higher sensitivity to gradient change, allowing for leveraging subtle perturbations on published images to significantly corrupt the generated images. We show that a subtle perturbation on an image can significantly impact the cross-attention layers, thus changing the mapping between text and image during the fine-tuning of customized diffusion models. Extensive experiments demonstrate that CAAT is compatible with diverse diffusion models and outperforms baseline attack methods in a more effective (more noise) and efficient (twice as fast as Anti-DreamBooth and Mist) manner.

Abstract (translated)

扩散模型（DMs）踏上了一个新的生成建模时代，并为高质和真实数据样本的生成提供了更多机会。然而，它们的广泛应用也催生了模型安全性的新挑战。这促使我们在DM上创建更有效的对抗攻击者，以了解其漏洞。我们提出了CAAT，一种简单但通用且高效的解决方案，不需要昂贵的训练，就能有效地欺骗潜在扩散模型（LDMs）。该方法基于一个观察，即跨注意层表现出对梯度变化的更高敏感性，允许我们利用已发布图像上的微小扰动来显著破坏生成的图像。我们证明了在图像上微小的扰动会对跨注意层产生重大影响，从而在自定义扩散模型微调过程中改变文本与图像之间的映射。大量实验证明，CAAT与各种扩散模型兼容，并且在更有效（更多噪声）和更高效（是Anti-DreamBooth和Mist的两倍快）的方式上优于基线攻击方法。

URL

https://arxiv.org/abs/2404.15081

PDF

https://arxiv.org/pdf/2404.15081.pdf
Read All

Content

Adversarial (20)

Adversarial

Abstract

Abstract (translated)

URL

PDF

Abstract

Abstract (translated)

URL

PDF

Abstract

Abstract (translated)

URL

PDF

Abstract

Abstract (translated)

URL

PDF

Abstract

Abstract (translated)

URL

PDF

Abstract

Abstract (translated)

URL

PDF

Abstract

Abstract (translated)

URL

PDF

Abstract

Abstract (translated)

URL

PDF

Abstract

Abstract (translated)

URL

PDF

Abstract

Abstract (translated)

URL

PDF

Abstract

Abstract (translated)

URL

PDF

Abstract

Abstract (translated)

URL

PDF

Abstract

Abstract (translated)

URL

PDF

Abstract

Abstract (translated)

URL

PDF

Abstract

Abstract (translated)

URL

PDF

Abstract

Abstract (translated)

URL

PDF

Abstract

Abstract (translated)

URL

PDF

Abstract

Abstract (translated)

URL

PDF

Abstract

Abstract (translated)

URL

PDF

Abstract

Abstract (translated)

URL