We propose the characteristic generator, a novel one-step generative model that combines the efficiency of sampling in Generative Adversarial Networks (GANs) with the stable performance of flow-based models. Our model is driven by characteristics, along which the probability density transport can be described by ordinary differential equations (ODEs). Specifically, We estimate the velocity field through nonparametric regression and utilize Euler method to solve the probability flow ODE, generating a series of discrete approximations to the characteristics. We then use a deep neural network to fit these characteristics, ensuring a one-step mapping that effectively pushes the prior distribution towards the target distribution. In the theoretical aspect, we analyze the errors in velocity matching, Euler discretization, and characteristic fitting to establish a non-asymptotic convergence rate for the characteristic generator in 2-Wasserstein distance. To the best of our knowledge, this is the first thorough analysis for simulation-free one step generative models. Additionally, our analysis refines the error analysis of flow-based generative models in prior works. We apply our method on both synthetic and real datasets, and the results demonstrate that the characteristic generator achieves high generation quality with just a single evaluation of neural network.
我们提出了一个新型的 one-step 生成模型——特征生成器,它结合了生成对抗网络(GANs)的抽样效率和基于流模型的稳定性能。我们的模型基于特征,其中概率密度传输可以用普通微分方程(ODEs)描述。具体来说,我们通过非参数回归估计速度场,并利用欧拉方法求解概率流 ODE,生成一系列离散的特征近似。然后我们使用深度神经网络来拟合这些特征,确保实现了一步将先验分布推向目标分布的效果。在理论方面,我们分析了速度匹配误差、欧拉离散化和特征拟合误差,以建立 2-Wasserstein 距离下特征生成器的非收缩收敛率。据我们所知,这是第一个针对无需模拟的 one-step 生成模型的深入分析。此外,我们的分析进一步提高了基于流模型的生成模型的误差分析。我们在 synthetic 和 real-world 数据集上应用我们的方法,结果表明,特征生成器只需对神经网络进行一次评估,就能实现高水平的生成质量。
https://arxiv.org/abs/2405.05512
To defend deep neural networks from adversarial attacks, adversarial training has been drawing increasing attention for its effectiveness. However, the accuracy and robustness resulting from the adversarial training are limited by the architecture, because adversarial training improves accuracy and robustness by adjusting the weight connection affiliated to the architecture. In this work, we propose ARNAS to search for accurate and robust architectures for adversarial training. First we design an accurate and robust search space, in which the placement of the cells and the proportional relationship of the filter numbers are carefully determined. With the design, the architectures can obtain both accuracy and robustness by deploying accurate and robust structures to their sensitive positions, respectively. Then we propose a differentiable multi-objective search strategy, performing gradient descent towards directions that are beneficial for both natural loss and adversarial loss, thus the accuracy and robustness can be guaranteed at the same time. We conduct comprehensive experiments in terms of white-box attacks, black-box attacks, and transferability. Experimental results show that the searched architecture has the strongest robustness with the competitive accuracy, and breaks the traditional idea that NAS-based architectures cannot transfer well to complex tasks in robustness scenarios. By analyzing outstanding architectures searched, we also conclude that accurate and robust neural architectures tend to deploy different structures near the input and output, which has great practical significance on both hand-crafting and automatically designing of accurate and robust architectures.
为了防御对抗性攻击,对抗性训练已经越来越受到关注,因为它的有效性。然而,对抗性训练产生的准确性和稳健性是有局限性的,因为对抗性训练通过调整与架构相关的权重连接来提高准确性和稳健性。在本文中,我们提出了ARNAS来搜索对抗性训练的准确性和稳健架构。首先,我们设计了一个准确且鲁棒的设计空间,其中单元格的放置和滤波器数量的比例关系被仔细确定。通过这个设计,架构可以通过在其敏感位置部署准确且鲁棒的结构来获得准确性和稳健性。然后,我们提出了一个不同的多目标搜索策略,沿着对自然损失和对抗损失都有益的方向进行梯度下降,从而确保准确性和稳健性。我们在白盒攻击、黑盒攻击和可转移性方面进行了全面的实验。实验结果表明,所搜索的架构具有与竞争准确度相同的鲁棒性,并打破了传统观念,即基于NAS的架构在鲁棒性场景下的复杂任务转移效果不佳。通过分析所搜索的最优秀的架构,我们得出的结论是,准确且鲁棒的神经架构通常会在输入和输出附近部署不同的结构,这在手绘和自动设计准确且鲁棒的架构方面具有很大的实际意义。
https://arxiv.org/abs/2405.05502
Skeleton-based motion visualization is a rising field in computer vision, especially in the case of virtual reality (VR). With further advancements in human-pose estimation and skeleton extracting sensors, more and more applications that utilize skeleton data have come about. These skeletons may appear to be anonymous but they contain embedded personally identifiable information (PII). In this paper we present a new anonymization technique that is based on motion retargeting, utilizing adversary classifiers to further remove PII embedded in the skeleton. Motion retargeting is effective in anonymization as it transfers the movement of the user onto the a dummy skeleton. In doing so, any PII linked to the skeleton will be based on the dummy skeleton instead of the user we are protecting. We propose a Privacy-centric Deep Motion Retargeting model (PMR) which aims to further clear the retargeted skeleton of PII through adversarial learning. In our experiments, PMR achieves motion retargeting utility performance on par with state of the art models while also reducing the performance of privacy attacks.
基于骨架的运动可视化是一个在计算机视觉领域正在崛起的领域,尤其是在虚拟现实(VR)中。随着人类姿态估计和骨架提取传感器的进一步发展,越来越多的应用利用骨架数据。这些骨架似乎看起来是匿名的,但它们包含嵌入的个人可识别信息(PII)。在本文中,我们提出了一个新的匿名化技术,基于运动重新定位,并利用对抗分类器进一步去除骨架中嵌入的PII。运动重新定位在匿名化方面的效果是,它将用户的运动转移到虚拟骨架上。这样一来,与骨架相关的任何PII都基于虚拟骨架而不是用户,我们旨在通过对抗学习进一步清除被保护的骨架中的PII。 在实验中,我们提出了一种隐私中心化的深度运动重新定位模型(PMR),旨在通过对抗学习进一步清除被保护的骨架中的PII。与最先进的模型相比,PMR在运动重新定位的可用性和隐私攻击效果方面都取得了很好的表现。
https://arxiv.org/abs/2405.05428
The growing use of large language model (LLM)-based conversational agents to manage sensitive user data raises significant privacy concerns. While these agents excel at understanding and acting on context, this capability can be exploited by malicious actors. We introduce a novel threat model where adversarial third-party apps manipulate the context of interaction to trick LLM-based agents into revealing private information not relevant to the task at hand. Grounded in the framework of contextual integrity, we introduce AirGapAgent, a privacy-conscious agent designed to prevent unintended data leakage by restricting the agent's access to only the data necessary for a specific task. Extensive experiments using Gemini, GPT, and Mistral models as agents validate our approach's effectiveness in mitigating this form of context hijacking while maintaining core agent functionality. For example, we show that a single-query context hijacking attack on a Gemini Ultra agent reduces its ability to protect user data from 94% to 45%, while an AirGapAgent achieves 97% protection, rendering the same attack ineffective.
大量使用基于大型语言模型(LLM)的会话代理来管理敏感用户数据会引发显著的隐私问题。虽然这些代理在理解和处理上下文方面表现出色,但恶意行为者可以利用这种能力。我们引入了一种新的威胁模型,其中对抗性第三方应用程序通过操纵交互上下文的语境来欺骗LLM-based代理透露与当前任务无关的私隐信息。在上下文完整性框架的基础上,我们引入了AirGapAgent,一种 privacy-conscious的代理,通过限制代理访问仅限于特定任务的必要数据来防止意外数据泄漏。使用Gemini、GPT和Mistral模型作为代理的实验证明了我们方法的有效性,在减轻这种语境入侵的同时保持核心代理功能。例如,我们证明了,对Gemini Ultra代理进行单查询语境 hijacking攻击会将其保护用户数据的能
https://arxiv.org/abs/2405.05175
Medical Image Synthesis (MIS) plays an important role in the intelligent medical field, which greatly saves the economic and time costs of medical diagnosis. However, due to the complexity of medical images and similar characteristics of different tissue cells, existing methods face great challenges in meeting their biological consistency. To this end, we propose the Hybrid Augmented Generative Adversarial Network (HAGAN) to maintain the authenticity of structural texture and tissue cells. HAGAN contains Attention Mixed (AttnMix) Generator, Hierarchical Discriminator and Reverse Skip Connection between Discriminator and Generator. The AttnMix consistency differentiable regularization encourages the perception in structural and textural variations between real and fake images, which improves the pathological integrity of synthetic images and the accuracy of features in local areas. The Hierarchical Discriminator introduces pixel-by-pixel discriminant feedback to generator for enhancing the saliency and discriminance of global and local details simultaneously. The Reverse Skip Connection further improves the accuracy for fine details by fusing real and synthetic distribution features. Our experimental evaluations on three datasets of different scales, i.e., COVID-CT, ACDC and BraTS2018, demonstrate that HAGAN outperforms the existing methods and achieves state-of-the-art performance in both high-resolution and low-resolution.
医学图像合成(MIS)在智能医疗领域中发挥着重要作用,大大降低了医疗诊断的经济和时间成本。然而,由于医学图像的复杂性和不同组织细胞的类似特征,现有方法在满足其生物一致性方面面临巨大挑战。为此,我们提出了混合增强生成对抗网络(HAGAN)来保持结构的真实性和组织细胞的真实性。HAGAN包括注意力混合(AttnMix)生成器、分层判别器和判别器和生成器的反向跳过连接。AttnMix一致性差分 regularization 鼓励在真实和假图像之间关注结构和组织学变异性,从而提高合成图像的病理完整性以及局部区域的特征准确性。分层判别器引入了逐像素判别反馈来增强生成器,以同时提高全局和局部细节的清晰度和鉴别度。反向跳过连接通过融合真实和合成分布特征进一步提高了准确度。我们在三个不同规模的数据集(即 COVID-CT、ACDC 和 BraTS2018)上的实验评估结果表明,HAGAN 优于现有方法,在 both high-resolution 和 low-resolution 高分辨率低分辨率方面实现了最先进的性能。
https://arxiv.org/abs/2405.04902
Modern large language models (LLMs) have a significant amount of world knowledge, which enables strong performance in commonsense reasoning and knowledge-intensive tasks when harnessed properly. The language model can also learn social biases, which has a significant potential for societal harm. There have been many mitigation strategies proposed for LLM safety, but it is unclear how effective they are for eliminating social biases. In this work, we propose a new methodology for attacking language models with knowledge graph augmented generation. We refactor natural language stereotypes into a knowledge graph, and use adversarial attacking strategies to induce biased responses from several open- and closed-source language models. We find our method increases bias in all models, even those trained with safety guardrails. This demonstrates the need for further research in AI safety, and further work in this new adversarial space.
现代大型语言模型(LLMs)具有大量的世界知识,在恰当的利用下,在常识推理和知识密集型任务中表现出强大的性能。语言模型还可以学习社会偏见,这有可能对社会造成严重伤害。为解决LLM的安全性问题,已经提出了许多缓解策略,但目前尚不清楚它们是否对消除社会偏见有效。在本文中,我们提出了一个新的方法来攻击知识图增强生成语言模型。我们将自然语言刻板印象重构为知识图,并使用对抗攻击策略促使多个开源和闭源语言模型产生有偏的响应。我们发现,我们的方法在所有模型上都增加了偏见,即使是经过安全网保护的模型也不例外。这表明需要进一步研究AI安全问题,并进一步探索这个新的对抗领域。
https://arxiv.org/abs/2405.04756
We present a new multi-modal face image generation method that converts a text prompt and a visual input, such as a semantic mask or scribble map, into a photo-realistic face image. To do this, we combine the strengths of Generative Adversarial networks (GANs) and diffusion models (DMs) by employing the multi-modal features in the DM into the latent space of the pre-trained GANs. We present a simple mapping and a style modulation network to link two models and convert meaningful representations in feature maps and attention maps into latent codes. With GAN inversion, the estimated latent codes can be used to generate 2D or 3D-aware facial images. We further present a multi-step training strategy that reflects textual and structural representations into the generated image. Our proposed network produces realistic 2D, multi-view, and stylized face images, which align well with inputs. We validate our method by using pre-trained 2D and 3D GANs, and our results outperform existing methods. Our project page is available at this https URL.
我们提出了一种新的多模态面部图像生成方法,可以将文本提示和视觉输入(如语义掩码或涂鸦图)转换为逼真的照片写实面部图像。为此,我们通过将多模态特征在扩散模型(DMs)的潜在空间中应用GANs的优点,将DM中的多模态特征与预训练GANs的潜在空间相结合。我们提出了一个简单的映射和风格调节网络,用于将特征图和注意图中的有意义表示转换为潜在代码。通过GAN反向传播,估计的潜在代码可用于生成2D或3D意识面部图像。我们还进一步提出了一种多步骤训练策略,将文本结构和结构表示融入生成的图像中。我们提出的网络生成逼真的2D、多视角和风格化的面部图像,与输入相一致。通过使用预训练的2D和3D GANs进行验证,我们的结果优于现有方法。我们的项目页面可在https://url.com/中找到。
https://arxiv.org/abs/2405.04356
Adversarial attacks in Natural Language Processing apply perturbations in the character or token levels. Token-level attacks, gaining prominence for their use of gradient-based methods, are susceptible to altering sentence semantics, leading to invalid adversarial examples. While character-level attacks easily maintain semantics, they have received less attention as they cannot easily adopt popular gradient-based methods, and are thought to be easy to defend. Challenging these beliefs, we introduce Charmer, an efficient query-based adversarial attack capable of achieving high attack success rate (ASR) while generating highly similar adversarial examples. Our method successfully targets both small (BERT) and large (Llama 2) models. Specifically, on BERT with SST-2, Charmer improves the ASR in 4.84% points and the USE similarity in 8% points with respect to the previous art. Our implementation is available in this https URL.
自然语言处理中的对抗攻击会通过在字符或词级别上应用扰动来发挥作用。词级攻击因为使用基于梯度的方法而受到关注,这些方法很容易在句子语义上改变,导致无效的对抗样本。而词级攻击容易保持语义,但由于它们无法轻松采用流行的基于梯度的方法,因此受到的关注较少。挑战这些信念,我们引入了Charmer,一种高效基于查询的攻击方法,可以在生成高度相似的对抗样本的同时实现高攻击成功率(ASR)。我们的方法成功针对BERT和Llama 2模型。具体来说,在BERT SST-2上,Charmer提高了ASR 4.84%和USE相似性 8% 点,与以前的作品相比。我们的实现可以在这个https://URL上找到。
https://arxiv.org/abs/2405.04346
Recent developments in large language models (LLMs), while offering a powerful foundation for developing natural language agents, raise safety concerns about them and the autonomous agents built upon them. Deception is one potential capability of AI agents of particular concern, which we refer to as an act or statement that misleads, hides the truth, or promotes a belief that is not true in its entirety or in part. We move away from the conventional understanding of deception through straight-out lying, making objective selfish decisions, or giving false information, as seen in previous AI safety research. We target a specific category of deception achieved through obfuscation and equivocation. We broadly explain the two types of deception by analogizing them with the rabbit-out-of-hat magic trick, where (i) the rabbit either comes out of a hidden trap door or (ii) (our focus) the audience is completely distracted to see the magician bring out the rabbit right in front of them using sleight of hand or misdirection. Our novel testbed framework displays intrinsic deception capabilities of LLM agents in a goal-driven environment when directed to be deceptive in their natural language generations in a two-agent adversarial dialogue system built upon the legislative task of "lobbying" for a bill. Along the lines of a goal-driven environment, we show developing deceptive capacity through a reinforcement learning setup, building it around the theories of language philosophy and cognitive psychology. We find that the lobbyist agent increases its deceptive capabilities by ~ 40% (relative) through subsequent reinforcement trials of adversarial interactions, and our deception detection mechanism shows a detection capability of up to 92%. Our results highlight potential issues in agent-human interaction, with agents potentially manipulating humans towards its programmed end-goal.
近年来,大型语言模型(LLMs)的发展为开发自然语言代理提供了强大的基础,但也引发了关于它们和基于它们的自主代理的安全问题。特别令人担忧的是欺骗,这是我们指的欺骗性行为或陈述,包括误导、隐瞒真相或促进不真实的信念。我们在前人工智能安全研究中,通过直言不讳地撒谎、做出客观的自私决策或提供虚假信息,远离了欺骗的传统理解。我们针对通过混淆和模棱两可达到的欺骗特定类别进行攻击。我们详细解释了两种欺骗类型。通过类比兔子脱出帽子魔术表演,我们(i)要么让兔子从隐藏的陷阱门中出来,要么(ii)我们的重点是,观众被魔术师利用魔术手法或误导观众带出现在他们面前的兔子所吸引。我们的新测试台框架在两个代理人的竞争对话系统中,当它们被设计为在自然语言生成中具有欺骗能力时,展示了LLM代理在目标导向环境中的内在欺骗能力。沿着目标导向环境的目标,我们通过强化学习框架开发了欺骗能力,并围绕语言哲学和认知心理学理论进行构建。我们发现,通过后续的对抗性交互试验, lobbyist代理的欺骗能力增加了~ 40%(相对),我们的欺骗检测机制具有高达92%的检测能力。我们的结果突出了人工智能代理与人类交互中潜在的问题,即代理可能会操纵人类以实现其预设目标。
https://arxiv.org/abs/2405.04325
Offline reinforcement learning (RL) provides a promising approach to avoid costly online interaction with the real environment. However, the performance of offline RL highly depends on the quality of the datasets, which may cause extrapolation error in the learning process. In many robotic applications, an inaccurate simulator is often available. However, the data directly collected from the inaccurate simulator cannot be directly used in offline RL due to the well-known exploration-exploitation dilemma and the dynamic gap between inaccurate simulation and the real environment. To address these issues, we propose a novel approach to combine the offline dataset and the inaccurate simulation data in a better manner. Specifically, we pre-train a generative adversarial network (GAN) model to fit the state distribution of the offline dataset. Given this, we collect data from the inaccurate simulator starting from the distribution provided by the generator and reweight the simulated data using the discriminator. Our experimental results in the D4RL benchmark and a real-world manipulation task confirm that our method can benefit more from both inaccurate simulator and limited offline datasets to achieve better performance than the state-of-the-art methods.
离线强化学习(RL)提供了一种有前途的方法来避免与真实环境进行昂贵的在线交互。然而,离线RL的性能高度依赖于数据质量,这可能导致学习过程中的扩展误差。在许多机器人应用中,通常缺乏准确的仿真器。然而,由于众所周知的学习-探索困境和仿真器和现实环境之间的动态差距,直接从不准确的仿真器中收集的数据无法直接用于离线RL。为解决这些问题,我们提出了一种结合离线数据和低质量仿真数据的新方法。具体来说,我们预训练了一个生成对抗网络(GAN)模型来适应离线数据的状态分布。然后,我们从生成器提供的分布开始收集离线仿真器的数据,并使用判别器对模拟数据进行重新加权。我们在D4RL基准和现实世界的操作任务中的实验结果证实,我们的方法可以从不准确的仿真器和有限离线数据中获得更好的性能,比最先进的方法具有更高的性能。
https://arxiv.org/abs/2405.04307
The efficacy of an large language model (LLM) generated text detector depends substantially on the availability of sizable training data. White-box zero-shot detectors, which require no such data, are nonetheless limited by the accessibility of the source model of the LLM-generated text. In this paper, we propose an simple but effective black-box zero-shot detection approach, predicated on the observation that human-written texts typically contain more grammatical errors than LLM-generated texts. This approach entails computing the Grammar Error Correction Score (GECScore) for the given text to distinguish between human-written and LLM-generated text. Extensive experimental results show that our method outperforms current state-of-the-art (SOTA) zero-shot and supervised methods, achieving an average AUROC of 98.7% and showing strong robustness against paraphrase and adversarial perturbation attacks.
大型语言模型(LLM)生成文本检测器的有效性很大程度上取决于可用的训练数据。白盒零击检测器,无需这种数据,仍然受到其源模型(LLM生成的文本)的可访问性的限制。在本文中,我们提出了一种简单但有效的黑盒零击检测方法,基于观察到人类编写的文本通常包含更多的语法错误这一事实。这种方法计算给定文本的语法错误纠正得分(GECScore),以区分人类编写的文本和LLM生成的文本。大量实验结果表明,我们的方法超越了当前最先进的(SOTA)零击和监督方法,实现平均AUROC为98.7%,并表现出对同义词和对抗干扰攻击具有较强的鲁棒性。
https://arxiv.org/abs/2405.04286
Foundation models have enormous potential in advancing Earth and climate sciences, however, current approaches may not be optimal as they focus on a few basic features of a desirable Earth and climate foundation model. Crafting the ideal Earth foundation model, we define eleven features which would allow such a foundation model to be beneficial for any geoscientific downstream application in an environmental- and human-centric manner.We further shed light on the way forward to achieve the ideal model and to evaluate Earth foundation models. What comes after foundation models? Energy efficient adaptation, adversarial defenses, and interpretability are among the emerging directions.
基础模型在促进地球和气候变化科学方面具有巨大的潜力,然而,目前的方法可能不是最优的,因为它们只关注了理想地球和气候变化基础模型的一些基本特征。打造理想的地球基础模型,我们定义了11个特性,这将使得这样的基础模型对任何以环境和人为中心的地球科学下游应用都具有益处。我们进一步探讨了实现理想模型以及评估地球基础模型的道路。在基础模型之后,能源高效的适应、对抗性防御和可解释性等方向正在涌现出来。
https://arxiv.org/abs/2405.04285
Breast cancer is a significant global health concern, particularly for women. Early detection and appropriate treatment are crucial in mitigating its impact, with histopathology examinations playing a vital role in swift diagnosis. However, these examinations often require a substantial workforce and experienced medical experts for proper recognition and cancer grading. Automated image retrieval systems have the potential to assist pathologists in identifying cancerous tissues, thereby accelerating the diagnostic process. Nevertheless, due to considerable variability among the tissue and cell patterns in histological images, proposing an accurate image retrieval model is very challenging. This work introduces a novel attention-based adversarially regularized variational graph autoencoder model for breast histological image retrieval. Additionally, we incorporated cluster-guided contrastive learning as the graph feature extractor to boost the retrieval performance. We evaluated the proposed model's performance on two publicly available datasets of breast cancer histological images and achieved superior or very competitive retrieval performance, with average mAP scores of 96.5% for the BreakHis dataset and 94.7% for the BACH dataset, and mVP scores of 91.9% and 91.3%, respectively. Our proposed retrieval model has the potential to be used in clinical settings to enhance diagnostic performance and ultimately benefit patients.
乳腺癌是一个全球健康问题,特别是对女性的影响非常大。早期诊断和适当的治疗对减轻其影响至关重要,而组织病理学检查在迅速诊断中发挥着关键作用。然而,这些检查通常需要大量的工作力和经验丰富的医疗专家来进行适当的识别和癌症分级。自动图像检索系统有可能帮助病理学家识别出恶性组织,从而加速诊断过程。然而,由于组织和细胞在组织病理图中的变异很大,提出准确的组织病理图检索模型非常具有挑战性。本工作提出了一种新颖的关注基于对抗训练的变分图自编码器模型用于乳腺癌组织病理图检索。此外,我们还引入了聚类引导的对比学习作为图特征提取器,以提高检索性能。我们在两个公开可用的乳腺癌组织病理图数据集上评估所提出的模型的性能,实现了卓越或非常竞争力的检索性能,平均mAP得分分别为96.5%的BreakHis数据集和94.7%的BACH数据集,平均mVP得分分别为91.9%和91.3%。我们提出的检索模型有望在临床实践中提高诊断性能,最终为患者带来好处。
https://arxiv.org/abs/2405.04211
Corruptions due to data perturbations and label noise are prevalent in the datasets from unreliable sources, which poses significant threats to model training. Despite existing efforts in developing robust models, current learning methods commonly overlook the possible co-existence of both corruptions, limiting the effectiveness and practicability of the model. In this paper, we develop an Effective and Robust Adversarial Training (ERAT) framework to simultaneously handle two types of corruption (i.e., data and label) without prior knowledge of their specifics. We propose a hybrid adversarial training surrounding multiple potential adversarial perturbations, alongside a semi-supervised learning based on class-rebalancing sample selection to enhance the resilience of the model for dual corruption. On the one hand, in the proposed adversarial training, the perturbation generation module learns multiple surrogate malicious data perturbations by taking a DNN model as the victim, while the model is trained to maintain semantic consistency between the original data and the hybrid perturbed data. It is expected to enable the model to cope with unpredictable perturbations in real-world data corruption. On the other hand, a class-rebalancing data selection strategy is designed to fairly differentiate clean labels from noisy labels. Semi-supervised learning is performed accordingly by discarding noisy labels. Extensive experiments demonstrate the superiority of the proposed ERAT framework.
由于数据扰动和标签噪声导致的腐败在不可靠数据源的数据集中普遍存在,这会对模型训练产生重大威胁。尽管已经开发出了一些 robust 的模型,但目前的训练方法通常忽视了两种腐败(即数据和标签)可能同时存在的可能性,从而限制了模型的有效性和可操作性。在本文中,我们提出了一种有效的鲁棒对抗训练(ERAT)框架,以同时处理两种腐败(即数据和标签),而无需具体了解其情况。我们提出了一种基于多个潜在对抗扰动周围进行半监督学习的方法,以及一种基于类重新平衡样本选择来增强模型对双重腐败的鲁棒性的方法。一方面,在所提出的 ERAT 训练中,扰动生成模块通过将 DNN 模型作为受害者来学习多个代理恶意数据扰动,而模型通过保持原始数据和混合扰动数据的语义一致来训练。预计这将使模型能够应对现实世界数据腐败中的不可预测扰动。另一方面,为了公平地区分清洁标签和噪声标签,我们设计了一种类重新平衡数据选择策略。相应地进行半监督学习,通过丢弃噪声标签。大量实验证明,所提出的 ERAT 框架具有优越性。
https://arxiv.org/abs/2405.04191
Large Language Models (LLMs) have shown impressive performance in natural language tasks, but their outputs can exhibit undesirable attributes or biases. Existing methods for steering LLMs towards desired attributes often assume unbiased representations and rely solely on steering prompts. However, the representations learned from pre-training can introduce semantic biases that influence the steering process, leading to suboptimal results. We propose LLMGuardaril, a novel framework that incorporates causal analysis and adversarial learning to obtain unbiased steering representations in LLMs. LLMGuardaril systematically identifies and blocks the confounding effects of biases, enabling the extraction of unbiased steering representations. Additionally, it includes an explainable component that provides insights into the alignment between the generated output and the desired direction. Experiments demonstrate LLMGuardaril's effectiveness in steering LLMs towards desired attributes while mitigating biases. Our work contributes to the development of safe and reliable LLMs that align with desired attributes. We discuss the limitations and future research directions, highlighting the need for ongoing research to address the ethical implications of large language models.
大语言模型(LLMs)在自然语言任务上的表现令人印象深刻,但它们的输出可能表现出不受欢迎的属性或偏见。为引导LLMs朝向所需属性,现有方法通常假定无偏表示,并仅依赖引导提示。然而,从预训练中学习到的表示可能引入语义偏见,影响引导过程,导致次优结果。我们提出LLMGuardaril,一种新颖的方法,将因果分析和对抗学习相结合,以获得LLMs的无偏引导表示。LLMGuardaril系统地识别并阻止偏见的影响,从而实现无偏引导表示的提取。此外,它包括一个可解释的组件,可提供关于生成输出与所需方向之间对齐的见解。实验证明,LLMGuardaril在引导LLMs朝向所需属性方面非常有效,同时减轻了偏见。我们的工作为开发与所需属性相一致的安全和可靠的LLM做出了贡献。我们讨论了局限性和未来的研究方向,强调了持续研究以解决大型语言模型可能产生的道德后果的重要性。
https://arxiv.org/abs/2405.04160
Adversarial machine learning, focused on studying various attacks and defenses on machine learning (ML) models, is rapidly gaining importance as ML is increasingly being adopted for optimizing wireless systems such as Open Radio Access Networks (O-RAN). A comprehensive modeling of the security threats and the demonstration of adversarial attacks and defenses on practical AI based O-RAN systems is still in its nascent stages. We begin by conducting threat modeling to pinpoint attack surfaces in O-RAN using an ML-based Connection management application (xApp) as an example. The xApp uses a Graph Neural Network trained using Deep Reinforcement Learning and achieves on average 54% improvement in the coverage rate measured as the 5th percentile user data rates. We then formulate and demonstrate evasion attacks that degrade the coverage rates by as much as 50% through injecting bounded noise at different threat surfaces including the open wireless medium itself. Crucially, we also compare and contrast the effectiveness of such attacks on the ML-based xApp and a non-ML based heuristic. We finally develop and demonstrate robust training-based defenses against the challenging physical/jamming-based attacks and show a 15% improvement in the coverage rates when compared to employing no defense over a range of noise budgets
对抗性机器学习(Adversarial machine learning)专注于研究各种针对机器学习(ML)模型的攻击和防御方法,随着越来越多的无线系统采用ML进行优化,对抗性机器学习在无线系统(如开放式无线接入网络O-RAN)中的应用正日益具有重要意义。全面建模安全威胁以及实际基于AI的O-RAN系统上的对抗性攻击和防御仍然是萌芽阶段。我们首先使用基于ML的连接管理应用程序(xApp)进行威胁建模,以确定O-RAN中的攻击面。xApp使用使用深度强化学习训练的图神经网络,实现了平均54%的用户数据速率覆盖率的增长。接着,我们formulate和demonstrate通过在不同的威胁面包括无线通信介质本身中注入有界噪声来降低覆盖率的攻击。关键是,我们还比较和对比了这种攻击在基于ML的xApp和非基于ML的启发式方法上的效果。最后,我们开发和展示了对抗性防御,用于应对具有挑战性的物理/干扰 based攻击,并展示了与不采取任何防御措施时的覆盖率相比,覆盖率提高了15%。
https://arxiv.org/abs/2405.03891
Recently, with the advancement of deep learning, several applications in text classification have advanced significantly. However, this improvement comes with a cost because deep learning is vulnerable to adversarial examples. This weakness indicates that deep learning is not very robust. Fortunately, the input of a text classifier is discrete. Hence, it can prevent the classifier from state-of-the-art attacks. Nonetheless, previous works have generated black-box attacks that successfully manipulate the discrete values of the input to find adversarial examples. Therefore, instead of changing the discrete values, we transform the input into its embedding vector containing real values to perform the state-of-the-art white-box attacks. Then, we convert the perturbed embedding vector back into a text and name it an adversarial example. In summary, we create a framework that measures the robustness of a text classifier by using the gradients of the classifier.
近年来,随着深度学习的进步,文本分类应用明显取得了进步。然而,这一进步也带来了成本,因为深度学习对对抗样本非常脆弱。这一弱点表明,深度学习并不非常健壮。幸运的是,文本分类器的输入是离散的。因此,它可以防止分类器受到最先进攻击的影响。然而,之前的工作已经产生了能够成功操纵输入离散值以寻找对抗样本的黑盒攻击。因此,我们不改变离散值,而是将输入转换为其包含实数的嵌入向量,以执行最先进的白盒攻击。然后,将受到攻击的嵌入向量转换回文本并命名为对抗样本。总之,我们创建了一个利用分类器梯度来衡量文本分类器 robustness 的框架。
https://arxiv.org/abs/2405.03789
The efficacy of deep learning models has been called into question by the presence of adversarial examples. Addressing the vulnerability of deep learning models to adversarial examples is crucial for ensuring their continued development and deployment. In this work, we focus on the role of rectified linear unit (ReLU) activation functions in the generation of adversarial examples. ReLU functions are commonly used in deep learning models because they facilitate the training process. However, our empirical analysis demonstrates that ReLU functions are not robust against adversarial examples. We propose a modified version of the ReLU function, which improves robustness against adversarial examples. Our results are supported by an experiment, which confirms the effectiveness of our proposed modification. Additionally, we demonstrate that applying adversarial training to our customized model further enhances its robustness compared to a general model.
深度学习模型的有效性因存在对抗样本而受到质疑。解决深度学习模型对对抗样本的脆弱性是确保其持续发展和部署的关键。在这项工作中,我们关注Rectified Linear Unit (ReLU)激活函数在生成对抗样本中的作用。通常,ReLU函数在深度学习模型中用于促进训练过程。然而,我们的实证分析表明,ReLU函数对抗样本并不具有鲁棒性。我们提出了一个改进的ReLU函数,可以提高其对抗样本的鲁棒性。我们的结果得到了一个实验的支持,证实了我们对ReLU函数的改进是有效的。此外,我们还证明了将对抗训练应用于我们的自定义模型可以进一步增强其鲁棒性,与通用模型相比。
https://arxiv.org/abs/2405.03777
Adversarial information operations can destabilize societies by undermining fair elections, manipulating public opinions on policies, and promoting scams. Despite their widespread occurrence and potential impacts, our understanding of influence campaigns is limited by manual analysis of messages and subjective interpretation of their observable behavior. In this paper, we explore whether these limitations can be mitigated with large language models (LLMs), using GPT-3.5 as a case-study for coordinated campaign annotation. We first use GPT-3.5 to scrutinize 126 identified information operations spanning over a decade. We utilize a number of metrics to quantify the close (if imperfect) agreement between LLM and ground truth descriptions. We next extract coordinated campaigns from two large multilingual datasets from X (formerly Twitter) that respectively discuss the 2022 French election and 2023 Balikaran Philippine-U.S. military exercise in 2023. For each coordinated campaign, we use GPT-3.5 to analyze posts related to a specific concern and extract goals, tactics, and narrative frames, both before and after critical events (such as the date of an election). While the GPT-3.5 sometimes disagrees with subjective interpretation, its ability to summarize and interpret demonstrates LLMs' potential to extract higher-order indicators from text to provide a more complete picture of the information campaigns compared to previous methods.
对抗性信息操作可以通过破坏公平选举、操纵公众对政策的看法和推动骗局来破坏社会。尽管这些现象普遍存在并具有潜在影响,但我们对信息操作的影响力了解仍然有限,主要是通过手动分析信息和服务消息以及主观解释它们的可观察行为。在本文中,我们探讨了是否可以通过大型语言模型(LLMs)来缓解这些限制,并使用GPT-3.5作为案例研究来协调活动注释。 首先,使用GPT-3.5对过去十年内发现的126个信息操作进行深入分析。我们使用多种指标来量化LLM和真实描述之间的亲密程度(如果存在不足)。接下来,我们从X(前Twitter)的两个大型多语言数据集中提取协调活动,分别讨论2022年法国选举和2023年巴利卡兰菲律宾-美国军事演习。对于每个协调活动,我们使用GPT-3.5分析与特定关注相关的帖子,并提取目标、策略和叙事框架,以及在关键事件(如选举日期)之前和之后。尽管GPT-3.5有时会与主观解释发生分歧,但其概括和解释的能力表明,LLMs可以从文本中提取更高层次的指标,为信息操作提供更完整的画面,与以前的方法相比。
https://arxiv.org/abs/2405.03688
Continuous Conditional Generative Modeling (CCGM) aims to estimate the distribution of high-dimensional data, typically images, conditioned on scalar continuous variables known as regression labels. While Continuous conditional Generative Adversarial Networks (CcGANs) were initially designed for this task, their adversarial training mechanism remains vulnerable to extremely sparse or imbalanced data, resulting in suboptimal outcomes. To enhance the quality of generated images, a promising alternative is to replace CcGANs with Conditional Diffusion Models (CDMs), renowned for their stable training process and ability to produce more realistic images. However, existing CDMs encounter challenges when applied to CCGM tasks due to several limitations such as inadequate U-Net architectures and deficient model fitting mechanisms for handling regression labels. In this paper, we introduce Continuous Conditional Diffusion Models (CCDMs), the first CDM designed specifically for the CCGM task. CCDMs address the limitations of existing CDMs by introducing specially designed conditional diffusion processes, a modified denoising U-Net with a custom-made conditioning mechanism, a novel hard vicinal loss for model fitting, and an efficient conditional sampling procedure. With comprehensive experiments on four datasets with varying resolutions ranging from 64x64 to 192x192, we demonstrate the superiority of the proposed CCDM over state-of-the-art CCGM models, establishing new benchmarks in CCGM. Extensive ablation studies validate the model design and implementation configuration of the proposed CCDM. Our code is publicly available at this https URL.
连续条件生成建模(CCGM)旨在根据已知的一维连续变量,通常为回归标签,估计高维数据的分布。虽然最初设计的CCGANs用于这项任务,但它们的对抗训练机制仍然容易受到极稀疏或不平衡数据的影响,导致性能较差。为了提高生成的图像的质量,一种有前途的替代方法是将CCGANs替换为条件扩散模型(CDMs),以其稳定的训练过程和生成更逼真的图像而闻名。然而,由于CCDMs在应用于CCGM任务时存在几个限制,如不足的U-Net架构和对于回归标签处理能力不足的模型拟合机制,因此现有的CDMs在应用于CCGM任务时遇到了挑战。在本文中,我们引入了专门为CCGM任务设计的连续条件扩散模型(CCDMs)。CCDMs通过引入特别设计的条件扩散过程、经过修改的带条件条件的U-Net、新的具有模型拟合特性的硬局部损失以及高效的条件抽样程序,解决了现有CDMs的局限性。在四个不同分辨率的公开数据集上进行全面的实验,从64x64到192x192,我们证明了所提出的CCDM相对于最先进的CCGM模型具有优越性,为CCGM设立了新的基准。广泛的消融分析验证了所提出的CCDM的模型设计和实现配置是正确的。我们的代码公开在https://这个URL上。
https://arxiv.org/abs/2405.03546