Large language models have become increasingly prominent, also signaling a shift towards multimodality as the next frontier in artificial intelligence, where their embeddings are harnessed as prompts to generate textual content. Vision-language models (VLMs) stand at the forefront of this advancement, offering innovative ways to combine visual and textual data for enhanced understanding and interaction. However, this integration also enlarges the attack surface. Patch-based adversarial attack is considered the most realistic threat model in physical vision applications, as demonstrated in many existing literature. In this paper, we propose to address patched visual prompt injection, where adversaries exploit adversarial patches to generate target content in VLMs. Our investigation reveals that patched adversarial prompts exhibit sensitivity to pixel-wise randomization, a trait that remains robust even against adaptive attacks designed to counteract such defenses. Leveraging this insight, we introduce SmoothVLM, a defense mechanism rooted in smoothing techniques, specifically tailored to protect VLMs from the threat of patched visual prompt injectors. Our framework significantly lowers the attack success rate to a range between 0% and 5.0% on two leading VLMs, while achieving around 67.3% to 95.0% context recovery of the benign images, demonstrating a balance between security and usability.
大型语言模型变得越来越突出,这也标志着人工智能向前迈进的方向,即利用它们的嵌入作为提示生成文本内容,朝着多模态发展的方向发展。视觉语言模型(VLMs)站在了这个进步的最前沿,为增强理解和交互提供了创新的视觉和文本数据结合方式。然而,这种集成也扩大了攻击面。基于补丁的攻击被认为是物理视觉应用中最现实的安全威胁,正如许多现有文献所证明的那样。在本文中,我们提出了针对补丁视觉提示注入的防御机制,其中攻击者利用补丁攻击来生成目标内容在VLMs中。我们的调查发现,补丁攻击提示具有对像素级随机化的敏感性,这种特质在针对这种防御设计的自适应攻击面前仍然保持稳健。利用这一洞察,我们引入了SmoothVLM,一种基于平滑技术的防御机制,专门针对保护VLMs免受补丁视觉提示注入的威胁。我们的框架在两个主要VLMs上的攻击成功率降低了95%以上,同时实现了95.0%以上的良性图像恢复,证明了安全和易用性的平衡。
https://arxiv.org/abs/2405.10529
Trigger points are a concept introduced by Mau, Lux, and Westheuser (2023) to study qualitative focus group interviews and understand polarisation in Germany. When people communicate, trigger points represent moments when individuals feel that their understanding of what is fair, normal, or appropriate in society is questioned. In the original studies, individuals react affectively to such triggers and show strong and negative emotional responses. In this paper, we introduce the first systematic study of the large-scale effect of individual words as trigger points by analysing a large amount of social media posts. We examine online deliberations on Reddit between 2020 and 2022 and collect >100 million posts from subreddits related to a set of words identified as trigger points in UK politics. We find that such trigger words affect user engagement and have noticeable consequences on animosity in online discussions. We share empirical evidence of trigger words causing animosity, and how they provide incentives for hate speech, adversarial debates, and disagreements. Our work is the first to introduce trigger points to computational studies of online communication. Our findings are relevant to researchers interested in online harms and who examine how citizens debate politics and society in light of affective polarisation.
触发点(trigger points)是Mau、Lux和Westheuser(2023)提出的概念,旨在研究定性焦点小组访谈以了解德国的极化现象。当人们交流时,触发点代表个人在社会中认为公平、正常或适当的看法受到质疑的时刻。在原始研究中,个人会以情感反应的形式对这样的触发点作出反应,并表现出强烈的积极或消极情感。在本文中,我们通过分析大量社交媒体帖子,系统地研究了个人单词作为触发点的大规模影响。我们研究了2020年到2022年间Reddit上的在线辩论,并将与英国政治触发点相关的一组单词的子reddit分为1000多个子reddit。我们发现,这样的触发词会影响用户的参与度,并在网上讨论中产生明显的后果。我们分享了关于触发词导致愤怒的实证证据,以及它们为仇恨言论、对抗性辩论和分歧提供激励的证据。我们的工作是第一个将触发点引入到计算机网络沟通研究中的。我们的研究结果与关注在线伤害的 researchers 相关,他们研究公民如何辩论政治和社会问题,以及情感极化如何影响公民的政治和社会互动。
https://arxiv.org/abs/2405.10213
Multi-modal Large Language Models (MLLMs) have recently achieved enhanced performance across various vision-language tasks including visual grounding capabilities. However, the adversarial robustness of visual grounding remains unexplored in MLLMs. To fill this gap, we use referring expression comprehension (REC) as an example task in visual grounding and propose three adversarial attack paradigms as follows. Firstly, untargeted adversarial attacks induce MLLMs to generate incorrect bounding boxes for each object. Besides, exclusive targeted adversarial attacks cause all generated outputs to the same target bounding box. In addition, permuted targeted adversarial attacks aim to permute all bounding boxes among different objects within a single image. Extensive experiments demonstrate that the proposed methods can successfully attack visual grounding capabilities of MLLMs. Our methods not only provide a new perspective for designing novel attacks but also serve as a strong baseline for improving the adversarial robustness for visual grounding of MLLMs.
多模态大型语言模型(MLLMs)在最近的各种视觉语言任务中取得了提高,包括视觉 grounding 功能。然而,在 MLLMs 中,视觉 grounding 的对抗性仍然是一个未探索的领域。为了填补这一空白,我们将视觉 grounding 任务中的参考表达理解(REC)作为示例任务,并提出了三种攻击范例。首先,无目标攻击导致 MLLMs 生成每个对象的错误边界框。其次,专有目标攻击导致所有生成的输出都指向同一个目标边界框。此外,异位目标攻击旨在将所有边界框在不同物体之间随机排列。大量实验证明,与 MLLMs 视觉 grounding 能力相关的建议方法可以成功地攻击其抗性。我们的方法不仅为设计新的攻击提供了新的视角,而且作为提高 MLLMs 视觉 grounding 强度的强基。
https://arxiv.org/abs/2405.09981
Infrared physical adversarial examples are of great significance for studying the security of infrared AI systems that are widely used in our lives such as autonomous driving. Previous infrared physical attacks mainly focused on 2D infrared pedestrian detection which may not fully manifest its destructiveness to AI systems. In this work, we propose a physical attack method against infrared detectors based on 3D modeling, which is applied to a real car. The goal is to design a set of infrared adversarial stickers to make cars invisible to infrared detectors at various viewing angles, distances, and scenes. We build a 3D infrared car model with real infrared characteristics and propose an infrared adversarial pattern generation method based on 3D mesh shadow. We propose a 3D control points-based mesh smoothing algorithm and use a set of smoothness loss functions to enhance the smoothness of adversarial meshes and facilitate the sticker implementation. Besides, We designed the aluminum stickers and conducted physical experiments on two real Mercedes-Benz A200L cars. Our adversarial stickers hid the cars from Faster RCNN, an object detector, at various viewing angles, distances, and scenes. The attack success rate (ASR) was 91.49% for real cars. In comparison, the ASRs of random stickers and no sticker were only 6.21% and 0.66%, respectively. In addition, the ASRs of the designed stickers against six unseen object detectors such as YOLOv3 and Deformable DETR were between 73.35%-95.80%, showing good transferability of the attack performance across detectors.
红外物理攻击范例对研究广泛应用于我们生活中的红外人工智能系统的安全性具有重大意义,如自动驾驶。以前的红外物理攻击主要集中在2D红外行人检测,这可能不足以完全表现其破坏性给人工智能系统。在这项工作中,我们提出了基于3D建模的对红外检测器的物理攻击方法,应用于实际汽车。目标是设计一组红外攻击贴纸,使汽车在不同的视角、距离和场景下对红外检测器不可见。我们基于真实红外特征构建了一个3D红外汽车模型,并提出了基于3D网格阴影的红外攻击图案生成方法。我们提出了基于3D控制点网格平滑算法,并使用一组平滑损失函数增强攻击mesh的平滑度,并促进贴纸的实现。此外,我们设计了一种铝制贴纸,并在两个实际梅赛德斯-奔驰A200L汽车上进行了物理实验。我们的攻击贴纸在不同的视角、距离和场景下成功隐藏了Faster RCNN(物体检测器)视野内的汽车。攻击成功率(ASR)为91.49%。与随机贴纸和无贴纸相比,ASR分别为6.21%和0.66%。此外,设计贴纸对六种未见过的物体检测器(如YOLOv3和Deformable DETR)的ASR在73.35%-95.80%之间,表明攻击性能的跨检测器效果很好。
https://arxiv.org/abs/2405.09924
With the rapid development of face recognition (FR) systems, the privacy of face images on social media is facing severe challenges due to the abuse of unauthorized FR systems. Some studies utilize adversarial attack techniques to defend against malicious FR systems by generating adversarial examples. However, the generated adversarial examples, i.e., the protected face images, tend to suffer from subpar visual quality and low transferability. In this paper, we propose a novel face protection approach, dubbed DiffAM, which leverages the powerful generative ability of diffusion models to generate high-quality protected face images with adversarial makeup transferred from reference images. To be specific, we first introduce a makeup removal module to generate non-makeup images utilizing a fine-tuned diffusion model with guidance of textual prompts in CLIP space. As the inverse process of makeup transfer, makeup removal can make it easier to establish the deterministic relationship between makeup domain and non-makeup domain regardless of elaborate text prompts. Then, with this relationship, a CLIP-based makeup loss along with an ensemble attack strategy is introduced to jointly guide the direction of adversarial makeup domain, achieving the generation of protected face images with natural-looking makeup and high black-box transferability. Extensive experiments demonstrate that DiffAM achieves higher visual quality and attack success rates with a gain of 12.98% under black-box setting compared with the state of the arts. The code will be available at this https URL.
随着面部识别(FR)系统的发展迅速,由于未经授权的FR系统的滥用,社交媒体上面部图像的隐私面临着严重的挑战。一些研究表明,通过使用对抗性攻击技术来防御恶意FR系统,生成对抗性样本。然而,生成的对抗性样本,即受保护的面部图像,往往视觉质量较差,且不具有很好的可转移性。在本文中,我们提出了一个名为DiffAM的新面部保护方法,它利用扩散模型的强大生成能力,通过从参考图像中转移的对抗性化妆生成高质量的保护面部图像。具体来说,我们首先引入了一个化妆去除模块,利用经过微调的扩散模型生成非化妆图像,通过文本提示在CLIP空间中指导。化妆去除是对化妆领域和非化妆领域之间的确定性关系的一种反向过程。然后,基于这一关系,我们引入了一种CLIP-based的化妆损失和集成攻击策略,共同引导对抗性化妆领域,实现具有自然妆容和高黑盒传输性的保护面部图像的生成。大量实验证明,DiffAM在黑盒设置下实现了12.98%的视觉质量和攻击成功率,与现有水平相当。代码将在此处公开可用:https://www.thisurl.com。
https://arxiv.org/abs/2405.09882
Box-free model watermarking is an emerging technique to safeguard the intellectual property of deep learning models, particularly those for low-level image processing tasks. Existing works have verified and improved its effectiveness in several aspects. However, in this paper, we reveal that box-free model watermarking is prone to removal attacks, even under the real-world threat model such that the protected model and the watermark extractor are in black boxes. Under this setting, we carry out three studies. 1) We develop an extractor-gradient-guided (EGG) remover and show its effectiveness when the extractor uses ReLU activation only. 2) More generally, for an unknown extractor, we leverage adversarial attacks and design the EGG remover based on the estimated gradients. 3) Under the most stringent condition that the extractor is inaccessible, we design a transferable remover based on a set of private proxy models. In all cases, the proposed removers can successfully remove embedded watermarks while preserving the quality of the processed images, and we also demonstrate that the EGG remover can even replace the watermarks. Extensive experimental results verify the effectiveness and generalizability of the proposed attacks, revealing the vulnerabilities of the existing box-free methods and calling for further research.
box-free 模型水印是一种新兴的保卫星性深度学习模型知识产权的技术。现有的工作已经证实和提高了其在多个方面的效果。然而,在本文中,我们揭示了即使在现实世界的威胁模型下,box-free 模型水印也会被删除攻击,即使提取器仅使用 ReLU 激活。在这种情况下,我们进行了三个研究。 1) 我们开发了一个提取器级导数引导(EGG)去噪器,并证明了当提取器仅使用 ReLU 激活时,其效果是有效的。 2) 对于未知提取器,我们利用对抗攻击并根据估计梯度设计 EGG 去噪器。 3) 在最严格的情况下,提取器无法访问,我们基于一系列私用代理模型设计了一个可传输的去噪器。在所有情况下,所提出的去噪器都能成功删除嵌入的水印,同时保留处理图像的质量,我们还证明了 EGG 去噪器甚至可以替代水印。大量的实验结果证实了所提出攻击的有效性和普遍性,揭示了现有 box-free 方法的漏洞,并呼吁进一步研究。
https://arxiv.org/abs/2405.09863
We study the problem of online unweighted bipartite matching with $n$ offline vertices and $n$ online vertices where one wishes to be competitive against the optimal offline algorithm. While the classic RANKING algorithm of Karp et al. [1990] provably attains competitive ratio of $1-1/e > 1/2$, we show that no learning-augmented method can be both 1-consistent and strictly better than $1/2$-robust under the adversarial arrival model. Meanwhile, under the random arrival model, we show how one can utilize methods from distribution testing to design an algorithm that takes in external advice about the online vertices and provably achieves competitive ratio interpolating between any ratio attainable by advice-free methods and the optimal ratio of 1, depending on the advice quality.
我们研究了在有 $n$ 个离线顶点和 $n$ 个在线顶点的情况下,在线不平衡二分匹配问题中的问题,用户希望在与离线最优算法的竞争中具有竞争力。虽然经典的 Karp 等人 [1990] 的排名算法在证明过程中达到 $1-1/e > 1/2$ 的竞争比率,但我们证明了没有学习增强方法可以在对抗到达模型下同时实现 $1$-一致性和 $1/2$ 以上的比率为 $1$。同时,在随机到达模型下,我们证明了如何利用分布测试的方法来设计一个算法,它接收关于在线顶点的外部建议,并证明可以在任何建议免费方法可达到的比率之间实现竞争比率 interpolation,以及与最优比率 $1$ 之间的比率。
https://arxiv.org/abs/2405.09784
In light of the widespread application of Automatic Speech Recognition (ASR) systems, their security concerns have received much more attention than ever before, primarily due to the susceptibility of Deep Neural Networks. Previous studies have illustrated that surreptitiously crafting adversarial perturbations enables the manipulation of speech recognition systems, resulting in the production of malicious commands. These attack methods mostly require adding noise perturbations under $\ell_p$ norm constraints, inevitably leaving behind artifacts of manual modifications. Recent research has alleviated this limitation by manipulating style vectors to synthesize adversarial examples based on Text-to-Speech (TTS) synthesis audio. However, style modifications based on optimization objectives significantly reduce the controllability and editability of audio styles. In this paper, we propose an attack on ASR systems based on user-customized style transfer. We first test the effect of Style Transfer Attack (STA) which combines style transfer and adversarial attack in sequential order. And then, as an improvement, we propose an iterative Style Code Attack (SCA) to maintain audio quality. Experimental results show that our method can meet the need for user-customized styles and achieve a success rate of 82% in attacks, while keeping sound naturalness due to our user study.
鉴于自动语音识别(ASR)系统的广泛应用,其安全性问题比以往任何时候都受到更多的关注,主要原因是深度神经网络的易受性。以前的研究表明,在约束条件下悄悄地生成对抗扰动能够操纵语音识别系统,从而产生恶意命令。这些攻击方法主要需要在$\ell_p$范数约束下添加噪声扰动,不可避免地留下了手动修改的残影。最近的研究通过将文本到语音(TTS)合成音频的对抗样本,缓解了这一限制。然而,基于优化目标的风格修改会显著降低音频风格的可控性和可编辑性。在本文中,我们提出了基于用户自定义风格迁移的ASR系统攻击。我们首先测试了顺序风格迁移攻击(STA)的效果。然后,作为改进,我们提出了一个迭代式风格码攻击(SCA)来保持音频质量。实验结果表明,我们的方法可以满足用户自定义风格的需求,攻击成功率为82%,同时保持声音的自然度。
https://arxiv.org/abs/2405.09470
Deep Neural Networks (DNNs) are known to be vulnerable to adversarial examples. Further, these adversarial examples are found to be transferable from the source network in which they are crafted to a black-box target network. As the trend of using deep learning on embedded devices grows, it becomes relevant to study the transferability properties of adversarial examples among compressed networks. In this paper, we consider quantization as a network compression technique and evaluate the performance of transfer-based attacks when the source and target networks are quantized at different bitwidths. We explore how algorithm specific properties affect transferability by considering various adversarial example generation algorithms. Furthermore, we examine transferability in a more realistic scenario where the source and target networks may differ in bitwidth and other model-related properties like capacity and architecture. We find that although quantization reduces transferability, certain attack types demonstrate an ability to enhance it. Additionally, the average transferability of adversarial examples among quantized versions of a network can be used to estimate the transferability to quantized target networks with varying capacity and architecture.
深度神经网络(DNNs)以其易受对抗性样本攻击而闻名。此外,这些对抗性样本已被证明可以在其创建的网络中从源网络转移到目标网络,且这些源网络中的攻击在目标网络中是透明的。随着在嵌入设备上使用深度学习的趋势不断增长,研究在压缩网络之间对抗性样本的传输特性变得尤为重要。在本文中,我们将量化作为一种网络压缩技术,评估在不同位宽下基于传输的攻击的性能。我们考虑了各种攻击生成算法,以评估算法特定属性对传输特性的影响。此外,我们在一个更现实的情况中研究了源网络和目标网络在位宽和其他与模型相关的属性(如容量和架构)上的差异。我们发现,尽管量化减少了传输性,但某些攻击类型表现出增强传输性的能力。此外,量化版本之间 adversarial 样本的平均传输性可用于估计具有不同容量和架构的量化目标网络的传输性。
https://arxiv.org/abs/2405.09598
With the benefit of deep learning techniques, recent researches have made significant progress in image compression artifacts reduction. Despite their improved performances, prevailing methods only focus on learning a mapping from the compressed image to the original one but ignore the intrinsic attributes of the given compressed images, which greatly harms the performance of downstream parsing tasks. Different from these methods, we propose to decouple the intrinsic attributes into two complementary features for artifacts reduction,ie, the compression-insensitive features to regularize the high-level semantic representations during training and the compression-sensitive features to be aware of the compression degree. To achieve this, we first employ adversarial training to regularize the compressed and original encoded features for retaining high-level semantics, and we then develop the compression quality-aware feature encoder for compression-sensitive features. Based on these dual complementary features, we propose a Dual Awareness Guidance Network (DAGN) to utilize these awareness features as transformation guidance during the decoding phase. In our proposed DAGN, we develop a cross-feature fusion module to maintain the consistency of compression-insensitive features by fusing compression-insensitive features into the artifacts reduction baseline. Our method achieves an average 2.06 dB PSNR gains on BSD500, outperforming state-of-the-art methods, and only requires 29.7 ms to process one image on BSD500. Besides, the experimental results on LIVE1 and LIU4K also demonstrate the efficiency, effectiveness, and superiority of the proposed method in terms of quantitative metrics, visual quality, and downstream machine vision tasks.
得益于深度学习技术的优势,近年来图像压缩伪影减少的研究取得了显著进展。尽管其性能有所提高,但现有的方法仅关注从压缩图像到原始图像的映射学习,而忽略了给定压缩图像的固有属性,这大大削弱了下游解码任务的性能。与这些方法不同,我们提出了一种将固有属性解耦为两个互补特征的方法,以便在图像压缩伪影减少中实现压缩敏感特征的感知。为了实现这一目标,我们首先使用对抗训练来对压缩和原始编码特征进行规范,保留高级语义表示,然后我们为压缩敏感特征开发了压缩质量感知特征编码器。基于这些互补特征,我们提出了一个双感知指导网络(DAGN)来在解码阶段利用这些感知特征作为变换指导。在我们的DAGN中,我们开发了一个跨特征融合模块,通过将压缩敏感特征与 artifacts reduction 基线融合来保持压缩感知特征的一致性。我们的方法在BSD500上的平均PSNR增益达到2.06 dB,超越了最先进的方法,并且仅在BSD500上处理一张图片就需要29.7毫秒。此外,LIVE1和LIU4K的实验结果也证明了我们在数量指标、视觉质量和下游机器视觉任务方面的方法的有效性和优越性。
https://arxiv.org/abs/2405.09291
In this work, we propose to tackle several challenges hindering the development of Automatic Target Detection (ATD) algorithms for ground targets in SAR images. To address the lack of representative training data, we propose a Deep Learning approach to train ATD models with synthetic target signatures produced with the MOCEM simulator. We define an incrustation pipeline to incorporate synthetic targets into real backgrounds. Using this hybrid dataset, we train ATD models specifically tailored to bridge the domain gap between synthetic and real data. Our approach notably relies on massive physics-based data augmentation techniques and Adversarial Training of two deep-learning detection architectures. We then test these models on several datasets, including (1) patchworks of real SAR images, (2) images with the incrustation of real targets in real backgrounds, and (3) images with the incrustation of synthetic background objects in real backgrounds. Results show that the produced hybrid datasets are exempt from image overlay bias. Our approach can reach up to 90% of Average Precision on real data while exclusively using synthetic targets for training.
在这项工作中,我们试图解决阻碍地面目标检测(ATD)算法在SAR图像中发展的几个挑战。为解决缺乏代表性训练数据的问题,我们提出了一种使用使用MOCEM模拟器生成的合成目标签名来训练ATD模型的Deep Learning方法。我们定义了一个嵌套的管道来将合成目标融入真实背景中。利用这个混合数据集,我们专门针对填补合成和真实数据之间的领域差距来训练ATD模型。我们的方法特别依赖于大规模基于物理的数据增强技术和深度学习检测架构的对抗训练。然后我们在多个数据集上对这些模型进行测试,包括(1)真实SAR图像的补丁,(2)在真实背景中嵌入真实目标的图像,(3)在真实背景中嵌入合成背景对象的图像。结果表明,生成的混合数据集免受图像覆盖偏差的影响。我们的方法可以在仅使用合成目标进行训练时达到平均精度90%左右。
https://arxiv.org/abs/2405.09588
Most dataset distillation methods struggle to accommodate large-scale datasets due to their substantial computational and memory requirements. In this paper, we present a curriculum-based dataset distillation framework designed to harmonize scalability with efficiency. This framework strategically distills synthetic images, adhering to a curriculum that transitions from simple to complex. By incorporating curriculum evaluation, we address the issue of previous methods generating images that tend to be homogeneous and simplistic, doing so at a manageable computational cost. Furthermore, we introduce adversarial optimization towards synthetic images to further improve their representativeness and safeguard against their overfitting to the neural network involved in distilling. This enhances the generalization capability of the distilled images across various neural network architectures and also increases their robustness to noise. Extensive experiments demonstrate that our framework sets new benchmarks in large-scale dataset distillation, achieving substantial improvements of 11.1\% on Tiny-ImageNet, 9.0\% on ImageNet-1K, and 7.3\% on ImageNet-21K. The source code will be released to the community.
大多数数据集提取方法由于其巨大的计算和内存需求而难以适应大规模数据集。在本文中,我们提出了一个以课程为基础的数据集提取框架,旨在实现可扩展性与效率的和谐。该框架通过策略性地提取合成图像,遵循从简单到复杂的课程。通过引入课程评估,我们解决了以前方法生成图像往往具有均匀性和简单性的问题,从而在可管理的地计算成本下实现这一目标。此外,我们还引入了对抗性优化方法来进一步改善合成图像的表示性和防止过拟合到参与提取的神经网络。这提高了提取图像的泛化能力,以及对噪声的容错能力。大量实验证明,我们的框架在大规模数据集提取领域设置了新的基准,实现了在Tiny-ImageNet上的11.1%的改进,在ImageNet-1K上的9.0%的改进,以及在ImageNet-21K上的7.3%的改进。源代码将公开发布给社区。
https://arxiv.org/abs/2405.09150
Sensor placement optimization methods have been studied extensively. They can be applied to a wide range of applications, including surveillance of known environments, optimal locations for 5G towers, and placement of missile defense systems. However, few works explore the robustness and efficiency of the resulting sensor network concerning sensor failure or adversarial attacks. This paper addresses this issue by optimizing for the least number of sensors to achieve multiple coverage of non-simply connected domains by a prescribed number of sensors. We introduce a new objective function for the greedy (next-best-view) algorithm to design efficient and robust sensor networks and derive theoretical bounds on the network's optimality. We further introduce a Deep Learning model to accelerate the algorithm for near real-time computations. The Deep Learning model requires the generation of training examples. Correspondingly, we show that understanding the geometric properties of the training data set provides important insights into the performance and training process of deep learning techniques. Finally, we demonstrate that a simple parallel version of the greedy approach using a simpler objective can be highly competitive.
传感器部署优化方法已经得到了广泛研究。它们可以应用于广泛的领域,包括监测已知环境、5G塔的最佳位置和导弹防御系统的部署。然而,很少有工作探讨传感器网络关于传感器故障或对抗攻击的鲁棒性和效率。本文通过优化传感器数量,实现对非简单连通领域指定传感器数量的多重覆盖,解决了这个问题。我们引入了一个新的目标函数,用于贪婪(下一个最佳视角)算法,以设计高效和鲁棒的传感器网络,并得出关于网络最优性的理论界线。我们进一步引入了一个深度学习模型,以加速算法的近实时计算。深度学习模型需要生成训练示例。因此,我们证明了理解训练数据集的几何特征提供了对深度学习技术性能和训练过程的重要见解。最后,我们证明了使用更简单的目标函数的简单并行版本可以具有很高的竞争力。
https://arxiv.org/abs/2405.09096
Domain adaptation is pivotal for enabling deep learning models to generalize across diverse domains, a task complicated by variations in presentation and cognitive nuances. In this paper, we introduce AD-Aligning, a novel approach that combines adversarial training with source-target domain alignment to enhance generalization capabilities. By pretraining with Coral loss and standard loss, AD-Aligning aligns target domain statistics with those of the pretrained encoder, preserving robustness while accommodating domain shifts. Through extensive experiments on diverse datasets and domain shift scenarios, including noise-induced shifts and cognitive domain adaptation tasks, we demonstrate AD-Aligning's superior performance compared to existing methods such as Deep Coral and ADDA. Our findings highlight AD-Aligning's ability to emulate the nuanced cognitive processes inherent in human perception, making it a promising solution for real-world applications requiring adaptable and robust domain adaptation strategies.
领域迁移是使深度学习模型能够在多样领域上进行泛化的关键任务,尤其是在表现和认知细微差异的复杂任务中。在本文中,我们 introduce AD-Aligning,一种结合对抗训练和源域-目标域对齐的新方法,以增强泛化能力。通过使用珊瑚损失和标准损失进行预训练,AD-Aligning将目标领域统计与预训练编码器的统计对齐,同时保持鲁棒性并适应领域变化。通过在各种数据集和领域转换场景(包括由噪声引起的转换和认知领域适应任务)的广泛实验,我们证明了AD-Aligning相对于现有方法如Deep Coral和ADDA具有卓越的性能。我们的研究结果突出了AD-Aligning能够模拟人类感知中固有的复杂认知过程的能力,为需要适应性和稳健领域迁移的实际应用提供了有前途的解决方案。
https://arxiv.org/abs/2405.09582
Large multimodal models (LMMs) have proven flexible and generalisable across many tasks and fields. Although they have strong potential to aid scientific research, their capabilities in this domain are not well characterised. A key aspect of scientific research is the ability to understand and interpret figures, which serve as a rich, compressed source of complex information. In this work, we present SciFIBench, a scientific figure interpretation benchmark. Our main benchmark consists of a 1000-question gold set of multiple-choice questions split between two tasks across 12 categories. The questions are curated from CS arXiv paper figures and captions, using adversarial filtering to find hard negatives and human verification for quality control. We evaluate 26 LMMs on SciFIBench, finding it to be a challenging benchmark. Finally, we investigate the alignment and reasoning faithfulness of the LMMs on augmented question sets from our benchmark. We release SciFIBench to encourage progress in this domain.
大规模多模态模型(LMMs)已经在许多任务和领域证明了自己的灵活性和通用性。尽管它们在科学研究方面具有强大的潜力,但它们在这个领域的表现尚不明确。科学研究的關鍵方面是理解和解释数据,这些数据作为複雜信息的豐富、壓縮資源。在本文中,我們介紹了SciFIBench,一個科學數據解釋 benchmark。主要的基准包括一個由1000個多选题問題組成的金寶石數據集,分佈在12個類別中跨越兩個任務。問題是從CS arXiv論文圖像和標題中策展的,使用 adversarial 過濾來發現強硬負面和人類驗證來進行質量控制。我們在SciFIBench上評估了26個LMM,發現它是一個具有挑戰性的標記。最後,我們研究了LMM在擴展問題集上的對 align 和reasoning faithfulness。我們發布SciFIBench,以鼓勵這個領域的進步。
https://arxiv.org/abs/2405.08807
Reports regarding the misuse of $\textit{Generative AI}$ ($\textit{GenAI}$) to create harmful deepfakes are emerging daily. Recently, defensive watermarking, which enables $\textit{GenAI}$ providers to hide fingerprints in their images to later use for deepfake detection, has been on the rise. Yet, its potential has not been fully explored. We present $\textit{UnMarker}$ -- the first practical $\textit{universal}$ attack on defensive watermarking. Unlike existing attacks, $\textit{UnMarker}$ requires no detector feedback, no unrealistic knowledge of the scheme or similar models, and no advanced denoising pipelines that may not be available. Instead, being the product of an in-depth analysis of the watermarking paradigm revealing that robust schemes must construct their watermarks in the spectral amplitudes, $\textit{UnMarker}$ employs two novel adversarial optimizations to disrupt the spectra of watermarked images, erasing the watermarks. Evaluations against the $\textit{SOTA}$ prove its effectiveness, not only defeating traditional schemes while retaining superior quality compared to existing attacks but also breaking $\textit{semantic}$ watermarks that alter the image's structure, reducing the best detection rate to $43\%$ and rendering them useless. To our knowledge, $\textit{UnMarker}$ is the first practical attack on $\textit{semantic}$ watermarks, which have been deemed the future of robust watermarking. $\textit{UnMarker}$ casts doubts on the very penitential of this countermeasure and exposes its paradoxical nature as designing schemes for robustness inevitably compromises other robustness aspects.
关于$\textit{Generative AI}$($\textit{GenAI}$)用于制作有害深度伪造的报告每天都在增加。最近,防御性水印标记(Defensive Watermarking)激增,它使$\textit{GenAI}$提供商能够在他们的图像中隐藏指纹,以便稍后用于深度伪造检测。然而,它的潜力还没有完全发挥出来。我们提出了$\textit{UnMarker}$——第一个针对防御性水印标记的实际通用攻击。与现有攻击不同,$\textit{UnMarker}$不需要检测器反馈,不需要对攻击方案或类似模型的不切实际知识,也不需要高级去噪管道,这些可能并不存在。相反,它是通过深入分析水印范式揭示出,具有弹性的方案必须在频谱幅度上构建水印,$\textit{UnMarker}$采用两种新颖的对抗性优化来干扰水印图像的频谱,消除水印。与当前最佳攻击($\textit{SOTA}$)的评估证明其有效性,不仅在对传统攻击的胜利中保持卓越的质量,而且打破了改变图像结构的“语义”水印,将最佳检测率降到43%,使它们变得毫无用处。据我们所知,$\textit{UnMarker}$是第一个针对“语义”水印的实际攻击,这些水印被认为将是未来具有弹性的水印方案。$\textit{UnMarker}$使人们对这一补救措施的惩罚产生怀疑,并揭示了其自相矛盾的性质,即为了设计具有弹性的方案,必然会牺牲其他方面的可靠性。
https://arxiv.org/abs/2405.08363
Integrated Speech and Large Language Models (SLMs) that can follow speech instructions and generate relevant text responses have gained popularity lately. However, the safety and robustness of these models remains largely unclear. In this work, we investigate the potential vulnerabilities of such instruction-following speech-language models to adversarial attacks and jailbreaking. Specifically, we design algorithms that can generate adversarial examples to jailbreak SLMs in both white-box and black-box attack settings without human involvement. Additionally, we propose countermeasures to thwart such jailbreaking attacks. Our models, trained on dialog data with speech instructions, achieve state-of-the-art performance on spoken question-answering task, scoring over 80% on both safety and helpfulness metrics. Despite safety guardrails, experiments on jailbreaking demonstrate the vulnerability of SLMs to adversarial perturbations and transfer attacks, with average attack success rates of 90% and 10% respectively when evaluated on a dataset of carefully designed harmful questions spanning 12 different toxic categories. However, we demonstrate that our proposed countermeasures reduce the attack success significantly.
近年来,整合了语音指令并能够生成相关文本响应的大规模语言模型(SLMs)受到了欢迎。然而,这些模型的安全性和鲁棒性仍然存在较大不确定性。在这项工作中,我们研究了这些指令跟随语音模型的潜在漏洞,以及针对这些模型的对抗攻击和破解。具体来说,我们设计了几种算法,可以生成对抗性样本来破解SLMs,无论是在白盒还是黑盒攻击设置中。此外,我们还提出了防止此类破解攻击的措施。我们基于语音指令的对话数据训练的模型在口头问题回答任务上实现了最先进的性能,得分超过80%的安全性和帮助性指标。尽管有安全防护措施,但在针对精心设计的具有12种有毒类别的有害问题的大型数据集上进行破解实验表明,SLMs对对抗扰动和传输攻击非常脆弱。然而,我们证明了我们提出的措施显著减少了攻击的成功率。
https://arxiv.org/abs/2405.08317
The uses of machine learning (ML) have snowballed in recent years. In many cases, ML models are highly complex, and their operation is beyond the understanding of human decision-makers. Nevertheless, some uses of ML models involve high-stakes and safety-critical applications. Explainable artificial intelligence (XAI) aims to help human decision-makers in understanding the operation of such complex ML models, thus eliciting trust in their operation. Unfortunately, the majority of past XAI work is based on informal approaches, that offer no guarantees of rigor. Unsurprisingly, there exists comprehensive experimental and theoretical evidence confirming that informal methods of XAI can provide human-decision makers with erroneous information. Logic-based XAI represents a rigorous approach to explainability; it is model-based and offers the strongest guarantees of rigor of computed explanations. However, a well-known drawback of logic-based XAI is the complexity of logic reasoning, especially for highly complex ML models. Recent work proposed distance-restricted explanations, i.e. explanations that are rigorous provided the distance to a given input is small enough. Distance-restricted explainability is tightly related with adversarial robustness, and it has been shown to scale for moderately complex ML models, but the number of inputs still represents a key limiting factor. This paper investigates novel algorithms for scaling up the performance of logic-based explainers when computing and enumerating ML model explanations with a large number of inputs.
机器学习(ML)的应用在过去几年里呈指数增长。在许多情况下,ML模型非常复杂,其操作超出了人类决策者的理解能力。然而,一些ML模型的应用具有高风险和高安全性的应用。可解释的人工智能(XAI)旨在帮助人类决策者理解此类复杂ML模型的操作,从而诱发光荣。然而,过去的XAI工作基本上是基于非正式方法,而这些方法没有保证的严谨性。不出所料,存在全面实验和理论证据证实,非正式的XAI方法可以向人类决策者提供错误的信息。基于逻辑的XAI是一种严谨的说明方法;它是基于模型的,并提供了计算解释的最强保证。然而,逻辑 based XAI 的一个已知缺点是逻辑推理的复杂性,特别是对于高度复杂的ML模型。最近的工作提出了距离受限的解释,即距离足够小的解释是严谨的。距离受限的说明性与对抗鲁棒性紧密相关,已经在处理适度复杂性的ML模型方面得到验证,但输入数量仍然是一个关键的制约因素。本文调查了用于通过大量输入提高逻辑 based explainer 性能的新算法。
https://arxiv.org/abs/2405.08297
Deep learning-based medical image segmentation models often face performance degradation when deployed across various medical centers, largely due to the discrepancies in data distribution. Test Time Adaptation (TTA) methods, which adapt pre-trained models to test data, have been employed to mitigate such discrepancies. However, existing TTA methods primarily focus on manipulating Batch Normalization (BN) layers or employing prompt and adversarial learning, which may not effectively rectify the inconsistencies arising from divergent data distributions. In this paper, we propose a novel Human-in-the-loop TTA (HiTTA) framework that stands out in two significant ways. First, it capitalizes on the largely overlooked potential of clinician-corrected predictions, integrating these corrections into the TTA process to steer the model towards predictions that coincide more closely with clinical annotation preferences. Second, our framework conceives a divergence loss, designed specifically to diminish the prediction divergence instigated by domain disparities, through the careful calibration of BN parameters. Our HiTTA is distinguished by its dual-faceted capability to acclimatize to the distribution of test data whilst ensuring the model's predictions align with clinical expectations, thereby enhancing its relevance in a medical context. Extensive experiments on a public dataset underscore the superiority of our HiTTA over existing TTA methods, emphasizing the advantages of integrating human feedback and our divergence loss in enhancing the model's performance and adaptability across diverse medical centers.
基于深度学习的医疗图像分割模型在部署到各个医疗机构时,常常会出现性能下降,主要原因是数据分布的不一致性。为了解决这个问题,已经使用了测试时间自适应(TTA)方法,这些方法将预训练的模型适应测试数据。然而,现有的TTA方法主要关注对Batch Normalization(BN)层进行操作或采用提示和对抗学习,这可能不能有效地解决由不同数据分布产生的不规范问题。在本文中,我们提出了一个新颖的人机交互式TTA(HiTTA)框架,具有以下两个显著特点。首先,它充分利用了临床人员校正预测的潜力,将这些校正纳入TTA过程,使模型更接近临床注释偏好。其次,通过精心调整BN参数,我们的框架提出了一种差异损失,旨在通过减少领域差异引起的预测差异来改善模型的预测准确性。我们的HiTTA在适应测试数据分布的同时,确保模型的预测与临床预期保持一致,从而在医学领域增强其相关性。在公开数据集上进行的大量实验证实了我们的HiTTA比现有TTA方法更优越,强调了将人类反馈和差异损失集成到模型中提高其性能和适应性的优势。
https://arxiv.org/abs/2405.08270
Synthesizing high-quality photorealistic images with textual descriptions as a condition is very challenging. Generative Adversarial Networks (GANs), the classical model for this task, frequently suffer from low consistency between image and text descriptions and insufficient richness in synthesized images. Recently, conditional affine transformations (CAT), such as conditional batch normalization and instance normalization, have been applied to different layers of GAN to control content synthesis in images. CAT is a multi-layer perceptron that independently predicts data based on batch statistics between neighboring layers, with global textual information unavailable to other layers. To address this issue, we first model CAT and a recurrent neural network (RAT) to ensure that different layers can access global information. We then introduce shuffle attention between RAT to mitigate the characteristic of information forgetting in recurrent neural networks. Moreover, both our generator and discriminator utilize the powerful pre-trained model, Clip, which has been extensively employed for establishing associations between text and images through the learning of multimodal representations in latent space. The discriminator utilizes CLIP's ability to comprehend complex scenes to accurately assess the quality of the generated images. Extensive experiments have been conducted on the CUB, Oxford, and CelebA-tiny datasets to demonstrate the superiority of the proposed model over current state-of-the-art models. The code is this https URL.
生成式对抗网络(GANs)是这个任务的经典模型,但通常在图像和文本描述之间缺乏一致性,并在生成图像的丰富性上不足。最近,条件反向传播(CAT)技术,如条件批归一化和实例归一化,已经被应用到GAN的不同层中,以控制图像中的内容合成。CAT是一个多层感知器,根据批归一化统计数据独立预测数据,而其他层则无法访问全局文本信息。为解决这个问题,我们首先将CAT和循环神经网络(RAT)建模,以确保不同层可以访问全局信息。然后,在RAT之间引入平移注意力和循环神经网络(RNN)的特点,降低信息遗忘的特点。此外,我们的生成器和鉴别器都利用了强大的预训练模型Clip,该模型已通过在潜在空间中学习多模态表示来建立文本和图像之间的联系。鉴别器利用Clip理解复杂场景,从而准确评估生成图像的质量。在CUB、牛津和CelebA-tiny数据集上进行了大量实验,证明了与当前最先进模型相比,所提出的模型具有优越性。代码在这个https URL上。
https://arxiv.org/abs/2405.08114