Large language models (LLMs) trained on web-scale datasets raise substantial concerns regarding permissible data usage. One major question is whether these models "memorize" all their training data or they integrate many data sources in some way more akin to how a human would learn and synthesize information. The answer hinges, to a large degree, on $\textit{how we define memorization}$. In this work, we propose the Adversarial Compression Ratio (ACR) as a metric for assessing memorization in LLMs -- a given string from the training data is considered memorized if it can be elicited by a prompt shorter than the string itself. In other words, these strings can be "compressed" with the model by computing adversarial prompts of fewer tokens. We outline the limitations of existing notions of memorization and show how the ACR overcomes these challenges by (i) offering an adversarial view to measuring memorization, especially for monitoring unlearning and compliance; and (ii) allowing for the flexibility to measure memorization for arbitrary strings at a reasonably low compute. Our definition serves as a valuable and practical tool for determining when model owners may be violating terms around data usage, providing a potential legal tool and a critical lens through which to address such scenarios. Project page: this https URL.
大规模语言模型(LLMs)在面向互联网数据集的训练中引发了很多关于数据使用的合理性问题。一个主要问题是这些模型是否“记忆”所有的训练数据,或者它们以某种更类似于人类学习和合成信息的方式来整合多个数据源。答案在很大程度上取决于我们如何定义“记忆”。在这篇工作中,我们提出了 Adversarial Compression Ratio(ACR)作为评估 LLM 记忆的一个指标。给定训练数据中的一个字符串被认为是记忆的,如果可以用一个比该字符串本身更短的提示来激发。换句话说,这些字符串可以通过计算 adversarial 提示的数量来“压缩”模型。我们概述了现有关于记忆的定义的局限性,并展示了 ACR 如何通过(i)提供对测量记忆的 adversarial 视角,特别是对于监控解学习和合规;(ii)允许在合理的低计算成本下测量任意字符串的记忆来克服这些挑战。我们的定义作为判断模型所有者是否违反数据使用条款的有价值且实用的工具是宝贵的,并且为我们提供了一个法律工具和一个关键的视角来解决这些问题。项目页面:此链接:<https://this.url>。
https://arxiv.org/abs/2404.15146
Diffusion models (DMs) embark a new era of generative modeling and offer more opportunities for efficient generating high-quality and realistic data samples. However, their widespread use has also brought forth new challenges in model security, which motivates the creation of more effective adversarial attackers on DMs to understand its vulnerability. We propose CAAT, a simple but generic and efficient approach that does not require costly training to effectively fool latent diffusion models (LDMs). The approach is based on the observation that cross-attention layers exhibits higher sensitivity to gradient change, allowing for leveraging subtle perturbations on published images to significantly corrupt the generated images. We show that a subtle perturbation on an image can significantly impact the cross-attention layers, thus changing the mapping between text and image during the fine-tuning of customized diffusion models. Extensive experiments demonstrate that CAAT is compatible with diverse diffusion models and outperforms baseline attack methods in a more effective (more noise) and efficient (twice as fast as Anti-DreamBooth and Mist) manner.
扩散模型(DMs)踏上了一个新的生成建模时代,并为高质和真实数据样本的生成提供了更多机会。然而,它们的广泛应用也催生了模型安全性的新挑战。这促使我们在DM上创建更有效的对抗攻击者,以了解其漏洞。我们提出了CAAT,一种简单但通用且高效的解决方案,不需要昂贵的训练,就能有效地欺骗潜在扩散模型(LDMs)。该方法基于一个观察,即跨注意层表现出对梯度变化的更高敏感性,允许我们利用已发布图像上的微小扰动来显著破坏生成的图像。我们证明了在图像上微小的扰动会对跨注意层产生重大影响,从而在自定义扩散模型微调过程中改变文本与图像之间的映射。大量实验证明,CAAT与各种扩散模型兼容,并且在更有效(更多噪声)和更高效(是Anti-DreamBooth和Mist的两倍快)的方式上优于基线攻击方法。
https://arxiv.org/abs/2404.15081
Graph neural networks are becoming increasingly popular in the field of machine learning due to their unique ability to process data structured in graphs. They have also been applied in safety-critical environments where perturbations inherently occur. However, these perturbations require us to formally verify neural networks before their deployment in safety-critical environments as neural networks are prone to adversarial attacks. While there exists research on the formal verification of neural networks, there is no work verifying the robustness of generic graph convolutional network architectures with uncertainty in the node features and in the graph structure over multiple message-passing steps. This work addresses this research gap by explicitly preserving the non-convex dependencies of all elements in the underlying computations through reachability analysis with (matrix) polynomial zonotopes. We demonstrate our approach on three popular benchmark datasets.
由于其处理数据结构为图的能力,图形神经网络在机器学习领域变得越来越受欢迎。它们还应用于安全性至关重要的环境中,在这些环境中,扰动是固有的。然而,这些扰动需要我们在将神经网络部署在安全性至关重要的环境中之前,正式验证神经网络。虽然关于神经网络的正式验证的研究已经存在,但是没有研究验证具有不确定性节点特征和图结构的通用图形卷积网络架构的鲁棒性。本文通过采用(矩阵)多项式界来保留计算背后的所有元素的凹凸依赖,解决这一研究空白。我们在三个流行的基准数据集上证明了我们的方法。
https://arxiv.org/abs/2404.15065
This paper puts forth a new training data-untethered model poisoning (MP) attack on federated learning (FL). The new MP attack extends an adversarial variational graph autoencoder (VGAE) to create malicious local models based solely on the benign local models overheard without any access to the training data of FL. Such an advancement leads to the VGAE-MP attack that is not only efficacious but also remains elusive to detection. VGAE-MP attack extracts graph structural correlations among the benign local models and the training data features, adversarially regenerates the graph structure, and generates malicious local models using the adversarial graph structure and benign models' features. Moreover, a new attacking algorithm is presented to train the malicious local models using VGAE and sub-gradient descent, while enabling an optimal selection of the benign local models for training the VGAE. Experiments demonstrate a gradual drop in FL accuracy under the proposed VGAE-MP attack and the ineffectiveness of existing defense mechanisms in detecting the attack, posing a severe threat to FL.
本文提出了一种新的联邦学习(FL)数据无连接模欺骗(MP)攻击。新提出的MP攻击将对抗变分图形自动编码器(VGAE)扩展到仅根据未获得FL训练数据的恶意局部模型的创建。这种进步导致了一种VGAE-MP攻击,不仅有效,而且对攻击的检测仍然难以实现。VGAE-MP攻击提取了恶意局部模型和训练数据特征之间的图形结构相关性,以 adversarially 生成 graph 结构,并使用恶意图形结构和良性模型的特征生成恶意局部模型。此外,还提出了一种用VGAE和亚最小二乘法训练恶意局部模型的攻击算法,同时允许为训练VGAE选择最优的良性局部模型。实验证明,在提出的VGAE-MP攻击下,FL的准确性逐渐下降,而现有的防御机制在检测攻击方面无能为力,对FL构成了严重的威胁。
https://arxiv.org/abs/2404.15042
Domain adaptive pose estimation aims to enable deep models trained on source domain (synthesized) datasets produce similar results on the target domain (real-world) datasets. The existing methods have made significant progress by conducting image-level or feature-level alignment. However, only aligning at a single level is not sufficient to fully bridge the domain gap and achieve excellent domain adaptive results. In this paper, we propose a multi-level domain adaptation aproach, which aligns different domains at the image, feature, and pose levels. Specifically, we first utilize image style transer to ensure that images from the source and target domains have a similar distribution. Subsequently, at the feature level, we employ adversarial training to make the features from the source and target domains preserve domain-invariant characeristics as much as possible. Finally, at the pose level, a self-supervised approach is utilized to enable the model to learn diverse knowledge, implicitly addressing the domain gap. Experimental results demonstrate that significant imrovement can be achieved by the proposed multi-level alignment method in pose estimation, which outperforms previous state-of-the-art in human pose by up to 2.4% and animal pose estimation by up to 3.1% for dogs and 1.4% for sheep.
领域自适应姿态估计的目的是使在源域(合成)数据上训练的深度模型在目标域(现实世界)数据上产生类似的结果。现有的方法通过进行图像级别或特征级别对齐取得了显著进展。然而,仅在单个层面对齐是不够的,不能完全弥合领域差异并获得卓越的领域自适应结果。在本文中,我们提出了一个多级领域自适应方法,该方法在图像、特征和姿态级别对齐不同领域。具体来说,我们首先利用图像风格转移来确保源域和目标域的图像具有相似的分布。然后,在特征级别,我们采用对抗训练来使源域和目标域的特征尽可能保持领域无关特征。最后,在姿态级别,采用自监督方法使模型能够学习到多样知识, implicitly addressing the domain gap。实验结果表明,与以前 state-of-the-art 相比,所提出的多级对齐方法在姿态估计方面取得了显著的改进,在人类姿态评估中提高了 2.4%,在动物姿态评估中提高了 3.1%,对于狗的动物姿态评估提高了 1.4%。
https://arxiv.org/abs/2404.14885
High-Level Synthesis (HLS) Design Space Exploration (DSE) is a widely accepted approach for efficiently exploring Pareto-optimal and optimal hardware solutions during the HLS process. Several HLS benchmarks and datasets are available for the research community to evaluate their methodologies. Unfortunately, these resources are limited and may not be sufficient for complex, multi-component system-level explorations. Generating new data using existing HLS benchmarks can be cumbersome, given the expertise and time required to effectively generate data for different HLS designs and directives. As a result, synthetic data has been used in prior work to evaluate system-level HLS DSE. However, the fidelity of the synthetic data to real data is often unclear, leading to uncertainty about the quality of system-level HLS DSE. This paper proposes a novel approach, called Vaegan, that employs generative machine learning to generate synthetic data that is robust enough to support complex system-level HLS DSE experiments that would be unattainable with only the currently available data. We explore and adapt a Variational Autoencoder (VAE) and Generative Adversarial Network (GAN) for this task and evaluate our approach using state-of-the-art datasets and metrics. We compare our approach to prior works and show that Vaegan effectively generates synthetic HLS data that closely mirrors the ground truth's distribution.
高级合成设计空间探索(HLS)是一种在HLS过程中广泛接受的方法,用于有效地探索Pareto最优和最优硬件解决方案。有许多HLS基准和数据集可供研究社区评估其方法论。然而,这些资源有限,可能不足以支持对复杂、多组件系统级探索。利用现有HLS基准生成新数据可能很费力,因为生成不同HLSD的设计和指令需要专业知识和时间。因此,在之前的工作中,使用合成数据来评估系统级HLSD是一种常见的做法。然而,合成数据的可靠性通常难以与真实数据相匹敌,导致对系统级HLSD的质量不确定性增加。 本文提出了一种名为Vaegan的新方法,采用生成机器学习来生成足够稳健的合成数据,以支持无法仅通过现有数据进行的复杂系统级HLSD实验。我们探讨并适应了这种任务,并使用最先进的 datasets 和 metrics 评估了我们的方法。我们比较了我们的方法与先前的作品,并证明了Vaegan有效地生成与真实分布高度相似的合成HLSD数据。
https://arxiv.org/abs/2404.14754
Teaching robots novel skills with demonstrations via human-in-the-loop data collection techniques like kinesthetic teaching or teleoperation puts a heavy burden on human supervisors. In contrast to this paradigm, it is often significantly easier to provide raw, action-free visual data of tasks being performed. Moreover, this data can even be mined from video datasets or the web. Ideally, this data can serve to guide robot learning for new tasks in novel environments, informing both "what" to do and "how" to do it. A powerful way to encode both the "what" and the "how" is to infer a well-shaped reward function for reinforcement learning. The challenge is determining how to ground visual demonstration inputs into a well-shaped and informative reward function. We propose a technique Rank2Reward for learning behaviors from videos of tasks being performed without access to any low-level states and actions. We do so by leveraging the videos to learn a reward function that measures incremental "progress" through a task by learning how to temporally rank the video frames in a demonstration. By inferring an appropriate ranking, the reward function is able to guide reinforcement learning by indicating when task progress is being made. This ranking function can be integrated into an adversarial imitation learning scheme resulting in an algorithm that can learn behaviors without exploiting the learned reward function. We demonstrate the effectiveness of Rank2Reward at learning behaviors from raw video on a number of tabletop manipulation tasks in both simulations and on a real-world robotic arm. We also demonstrate how Rank2Reward can be easily extended to be applicable to web-scale video datasets.
通过使用人机交互数据收集技术(如本体感知教学或遥控操作)进行示例教学,让机器人学习 novel skills 会为人类监督者带来沉重的负担。相比之下,提供原始、无动作的任务执行数据要容易得多。此外,这种数据甚至可以从视频数据集中或互联网上进行挖掘。在理想情况下,这些数据可以为机器人提供关于新任务和新环境下的机器学习指导,告知做什么以及如何做。通过推断一个形状良好且信息丰富的奖励函数来编码 both the "what" 和 the "how" 是一种强大的方法。挑战在于将视觉演示输入 ground 到一个形状良好且具有指导性的奖励函数中。我们提出了 Rank2Reward 技术,用于从执行任务的视频序列中学习行为,而无需访问任何低级状态和动作。我们通过学习如何按时间排序演示视频帧来推断适当的分级,从而使奖励函数能够指导强化学习,表明任务进展。这个排名函数可以集成到 adversarial imitation learning 方案中,从而学习行为而无需利用所学习的奖励函数。我们证明了 Rank2Reward 在从模拟和现实世界机器人手臂的许多表单操作任务中学习行为方面的有效性。我们还展示了 Rank2Reward 如何很容易地扩展到适用于网页规模视频数据集。
https://arxiv.org/abs/2404.14735
Recent progress in large-scale zero-shot speech synthesis has been significantly advanced by language models and diffusion models. However, the generation process of both methods is slow and computationally intensive. Efficient speech synthesis using a lower computing budget to achieve quality on par with previous work remains a significant challenge. In this paper, we present FlashSpeech, a large-scale zero-shot speech synthesis system with approximately 5\% of the inference time compared with previous work. FlashSpeech is built on the latent consistency model and applies a novel adversarial consistency training approach that can train from scratch without the need for a pre-trained diffusion model as the teacher. Furthermore, a new prosody generator module enhances the diversity of prosody, making the rhythm of the speech sound more natural. The generation processes of FlashSpeech can be achieved efficiently with one or two sampling steps while maintaining high audio quality and high similarity to the audio prompt for zero-shot speech generation. Our experimental results demonstrate the superior performance of FlashSpeech. Notably, FlashSpeech can be about 20 times faster than other zero-shot speech synthesis systems while maintaining comparable performance in terms of voice quality and similarity. Furthermore, FlashSpeech demonstrates its versatility by efficiently performing tasks like voice conversion, speech editing, and diverse speech sampling. Audio samples can be found in this https URL.
近年来,在大型零样本语音合成方面,自然语言处理(NLP)模型和扩散模型的进步显著加快了该领域的进展。然而,这两种方法的生成过程缓慢且计算密集。使用较低的计算预算实现与之前工作相同的质量仍然是一个重要的挑战。在本文中,我们提出了 FlashSpeech,一种大型零样本语音合成系统,与之前的工作相比,其推理时间减少了约 5%。FlashSpeech 基于潜在一致性模型,并应用了一种新颖的对抗性一致性训练方法,可以从零开始训练,无需预先训练的扩散模型作为教师。此外,一个新的元音生成器模块增强了元音的多样性,使语音节奏更加自然。FlashSpeech 的生成过程可以通过一个或两个采样步骤实现高效,同时保持高音频质量和与零样本语音生成的音频提示的高相似度。我们的实验结果证明了 FlashSpeech 的卓越性能。值得注意的是,FlashSpeech 可以在保持与其它零样本语音合成系统相当的声音质量和相似性的同时,大约 20 倍于其他系统。此外,FlashSpeech 通过有效地执行像语音转换、语音编辑和多样语音采样等任务,展示了其多才性。音频样本可在此链接中找到。
https://arxiv.org/abs/2404.14700
The wide deployment of Face Recognition (FR) systems poses risks of privacy leakage. One countermeasure to address this issue is adversarial attacks, which deceive malicious FR searches but simultaneously interfere the normal identity verification of trusted authorizers. In this paper, we propose the first Double Privacy Guard (DPG) scheme based on traceable adversarial watermarking. DPG employs a one-time watermark embedding to deceive unauthorized FR models and allows authorizers to perform identity verification by extracting the watermark. Specifically, we propose an information-guided adversarial attack against FR models. The encoder embeds an identity-specific watermark into the deep feature space of the carrier, guiding recognizable features of the image to deviate from the source identity. We further adopt a collaborative meta-optimization strategy compatible with sub-tasks, which regularizes the joint optimization direction of the encoder and decoder. This strategy enhances the representation of universal carrier features, mitigating multi-objective optimization conflicts in watermarking. Experiments confirm that DPG achieves significant attack success rates and traceability accuracy on state-of-the-art FR models, exhibiting remarkable robustness that outperforms the existing privacy protection methods using adversarial attacks and deep watermarking, or simple combinations of the two. Our work potentially opens up new insights into proactive protection for FR privacy.
广泛部署人脸识别(FR)系统会带来隐私泄露的风险。解决这个问题的一个对策是对抗性攻击,这种攻击会欺骗恶意的人脸识别,但同时会干扰可信授权者的正常身份验证。在本文中,我们提出了基于可追踪的对抗性水印的第一个双隐私保护(DPG)方案。DPG采用一次性水印嵌入来欺骗未经授权的人脸识别模型,并允许授权者通过提取水印来验证身份。具体来说,我们针对FR模型提出了信息指导的对抗性攻击。编码器将身份特定的水印嵌入到载体的深度特征空间中,引导图像的可识别特征远离源身份。我们进一步采用了一种可互补的元优化策略,该策略与子任务兼容,规范了编码器和解码器的联合优化方向。这种策略提高了普遍载荷特征的代表性,减轻了水印标记中的多目标优化冲突。实验证实,DPG在最先进的FR模型上实现了显著的攻击成功率和可追溯准确性,表现出出色的稳健性,超过使用对抗攻击和深度水印的现有隐私保护方法,或者使用简单的水印和编码器组合。我们的工作可能会为FR隐私的主动保护提供新的见解。
https://arxiv.org/abs/2404.14693
Lane detection has evolved highly functional autonomous driving system to understand driving scenes even under complex environments. In this paper, we work towards developing a generalized computer vision system able to detect lanes without using any annotation. We make the following contributions: (i) We illustrate how to perform unsupervised 3D lane segmentation by leveraging the distinctive intensity of lanes on the LiDAR point cloud frames, and then obtain the noisy lane labels in the 2D plane by projecting the 3D points; (ii) We propose a novel self-supervised training scheme, dubbed LaneCorrect, that automatically corrects the lane label by learning geometric consistency and instance awareness from the adversarial augmentations; (iii) With the self-supervised pre-trained model, we distill to train a student network for arbitrary target lane (e.g., TuSimple) detection without any human labels; (iv) We thoroughly evaluate our self-supervised method on four major lane detection benchmarks (including TuSimple, CULane, CurveLanes and LLAMAS) and demonstrate excellent performance compared with existing supervised counterpart, whilst showing more effective results on alleviating the domain gap, i.e., training on CULane and test on TuSimple.
车道检测已经发展成为高度功能自动驾驶系统,以在复杂环境中理解驾驶场景。在本文中,我们致力于开发一个通用计算机视觉系统,能够无需使用任何标注来检测车道。我们做出以下贡献:(一)通过利用LIDAR点云帧中车道独特的强度进行无监督的三维车道分割,然后通过投影获取二维平面上的噪音车道标签;(二)我们提出了一种新颖的自监督训练方案,称为LaneCorrect,通过学习来自对抗增强的几何一致性和实例意识来自动纠正车道标签;(三)在自监督预训练模型的基础上,我们通过训练学生网络来检测任意目标车道(例如TuSimple)而无需任何人类标签;(四)我们在包括TuSimple、CULane、CurveLanes和LLAMAS在内的四个主要车道检测基准上对自监督方法进行了全面评估,并证明了与现有监督方法相比具有卓越的性能,同时表现出在减轻领域差异方面的更有效结果,即在CULane上训练并在TuSimple上测试。
https://arxiv.org/abs/2404.14671
Artificial intelligence (AI) and neuroscience share a rich history, with advancements in neuroscience shaping the development of AI systems capable of human-like knowledge retention. Leveraging insights from neuroscience and existing research in adversarial and continual learning, we introduce a novel framework comprising two core concepts: feature distillation and re-consolidation. Our framework, named Robust Rehearsal, addresses the challenge of catastrophic forgetting inherent in continual learning (CL) systems by distilling and rehearsing robust features. Inspired by the mammalian brain's memory consolidation process, Robust Rehearsal aims to emulate the rehearsal of distilled experiences during learning tasks. Additionally, it mimics memory re-consolidation, where new experiences influence the integration of past experiences to mitigate forgetting. Extensive experiments conducted on CIFAR10, CIFAR100, and real-world helicopter attitude datasets showcase the superior performance of CL models trained with Robust Rehearsal compared to baseline methods. Furthermore, examining different optimization training objectives-joint, continual, and adversarial learning-we highlight the crucial role of feature learning in model performance. This underscores the significance of rehearsing CL-robust samples in mitigating catastrophic forgetting. In conclusion, aligning CL approaches with neuroscience insights offers promising solutions to the challenge of catastrophic forgetting, paving the way for more robust and human-like AI systems.
人工智能(AI)和神经科学之间有一个丰富的历史,神经科学的进步塑造了能够像人类一样保留知识的人工智能系统的发展。利用神经科学和现有对抗学习和持续学习的研究,我们引入了一个新框架,包括两个核心概念:特征提取和重新巩固。我们的框架名为Robust Rehearsal,通过提取和演练 robust 特征来解决连续学习(CL)系统中的灾难性遗忘挑战。灵感来自哺乳动物大脑的记忆巩固过程,Robust Rehearsal 旨在模仿在学习任务中的提取经历的演练。此外,它还模仿了记忆重新巩固,即新的经验通过整合过去的经验来缓解遗忘。在CIFAR10、CIFAR100和现实直升机的姿态数据集上进行的大量实验展示了使用Robust Rehearsal训练的CL模型的优越性能,相对于基线方法。此外,通过研究不同优化训练目标——联合、持续和对抗学习——我们强调了特征学习在模型性能中的关键作用。这使得在减轻灾难性遗忘方面演习 CL-robust 样本具有重要意义。总之,将AI方法与神经科学见解相结合为解决灾难性遗忘提供了解决方案,为更加健壮和像人类一样的AI系统铺平道路。
https://arxiv.org/abs/2404.14588
Generative AI models can produce high-quality images based on text prompts. The generated images often appear indistinguishable from images generated by conventional optical photography devices or created by human artists (i.e., real images). While the outstanding performance of such generative models is generally well received, security concerns arise. For instance, such image generators could be used to facilitate fraud or scam schemes, generate and spread misinformation, or produce fabricated artworks. In this paper, we present a systematic attempt at understanding and detecting AI-generated images (AI-art) in adversarial scenarios. First, we collect and share a dataset of real images and their corresponding artificial counterparts generated by four popular AI image generators. The dataset, named ARIA, contains over 140K images in five categories: artworks (painting), social media images, news photos, disaster scenes, and anime pictures. This dataset can be used as a foundation to support future research on adversarial AI-art. Next, we present a user study that employs the ARIA dataset to evaluate if real-world users can distinguish with or without reference images. In a benchmarking study, we further evaluate if state-of-the-art open-source and commercial AI image detectors can effectively identify the images in the ARIA dataset. Finally, we present a ResNet-50 classifier and evaluate its accuracy and transferability on the ARIA dataset.
生成式AI模型可以根据文本提示生成高质量的图像。生成的图像通常与传统光学摄影设备或由人类艺术家创建的图像很难区分(即真实图像)。虽然这种生成模型的突出表现通常得到了好评,但安全性问题也随之而来。例如,这种图像生成器可以用于促进诈骗或骗局,传播虚假信息或伪造艺术品。在本文中,我们提出了一个系统性的尝试,旨在理解和检测在对抗性场景中的人工生成图像(AI-艺术)。首先,我们收集并共享了由四个流行AI图像生成器生成的真实图像及其相应的人工对照物的数据集。这个数据集名为ARIA,包含超过14万张图像,分为五个类别:艺术品(绘画)、社交媒体图片、新闻照片、灾难场景和动漫图片。这个数据集可以作为支持未来研究对抗性AI-艺术的基础。接下来,我们进行了一个用户研究,使用ARIA数据集来评估现实世界的用户是否能够通过或不需要参考图像来区分真实图像。在基准测试中,我们进一步评估了最先进的开源和商业AI图像检测器是否能够有效识别ARIA数据集中的图像。最后,我们提出了一个ResNet-50分类器和评估其准确性及可转移性在ARIA数据集上的效果。
https://arxiv.org/abs/2404.14581
Tokenization is widely used in large language models because it significantly improves performance. However, tokenization imposes several disadvantages, such as performance biases, increased adversarial vulnerability, decreased character-level modeling performance, and increased modeling complexity. To address these disadvantages without sacrificing performance, we propose SpaceByte, a novel byte-level decoder architecture that closes the performance gap between byte-level and subword autoregressive language modeling. SpaceByte consists of a byte-level Transformer model, but with extra larger transformer blocks inserted in the middle of the layers. We find that performance is significantly improved by applying these larger blocks only after certain bytes, such as space characters, which typically denote word boundaries. Our experiments show that for a fixed training and inference compute budget, SpaceByte outperforms other byte-level architectures and roughly matches the performance of tokenized Transformer architectures.
Tokenization在大型语言模型中应用广泛,因为它显著提高了性能。然而,标记化也会引入几个缺点,例如性能偏见、增加的对抗性漏洞、降低的级联建模性能和增加的建模复杂性。为了在牺牲性能的同时解决这些问题,我们提出了SpaceByte,一种新颖的byte级解码器架构,它填补了byte级和subword自回归语言建模之间的性能差距。SpaceByte由一个byte级的Transformer模型组成,但中间层中插入了一些更大的Transformer模块。我们发现,仅在某些字节应用这些较大的模块,性能才会有显著的提高,这些字节通常表示空间字符,它们表示词边界。我们的实验结果表明,对于固定的训练和推理计算预算,SpaceByte在其他byte级架构中表现优异,且与标记化的Transformer架构的性能大致相当。
https://arxiv.org/abs/2404.14408
Stable locomotion in precipitous environments is an essential capability of quadruped robots, demanding the ability to resist various external disturbances. However, recent learning-based policies only use basic domain randomization to improve the robustness of learned policies, which cannot guarantee that the robot has adequate disturbance resistance capabilities. In this paper, we propose to model the learning process as an adversarial interaction between the actor and a newly introduced disturber and ensure their optimization with $H_{\infty}$ constraint. In contrast to the actor that maximizes the discounted overall reward, the disturber is responsible for generating effective external forces and is optimized by maximizing the error between the task reward and its oracle, i.e., "cost" in each iteration. To keep joint optimization between the actor and the disturber stable, our $H_{\infty}$ constraint mandates the bound of ratio between the cost to the intensity of the external forces. Through reciprocal interaction throughout the training phase, the actor can acquire the capability to navigate increasingly complex physical disturbances. We verify the robustness of our approach on quadrupedal locomotion tasks with Unitree Aliengo robot, and also a more challenging task with Unitree A1 robot, where the quadruped is expected to perform locomotion merely on its hind legs as if it is a bipedal robot. The simulated quantitative results show improvement against baselines, demonstrating the effectiveness of the method and each design choice. On the other hand, real-robot experiments qualitatively exhibit how robust the policy is when interfering with various disturbances on various terrains, including stairs, high platforms, slopes, and slippery terrains. All code, checkpoints, and real-world deployment guidance will be made public.
在险峻环境中实现稳定的运动是四足机器人的关键能力,要求具有抵抗各种外部干扰的能力。然而,最近基于学习的策略仅使用基本的领域随机化来提高学到的策略的鲁棒性,这不能保证机器人具有足够的干扰抵抗能力。在本文中,我们将建模学习过程为演员与一个新引入的干扰器之间的对抗交互,并通过$H_{\infty}$约束确保它们的优化。与最大化累计奖励的演员不同,干扰器负责生成有效的外部力,并通过最大化任务奖励与其预言之间的误差来优化,即“成本”在每个迭代中。为了保持演员和干扰器之间的联合优化稳定,我们的$H_{\infty}$约束要求外力成本与强度之间的比值的边界。在训练过程中通过相互交互,演员可以获得在 increasingly复杂的物理干扰中航行的能力。我们在 Unitree Aliengo 机器人上验证了我们的方法的有效性,还使用 Unitree A1 机器人进行了一个更具有挑战性的任务,其中假设四足机器人仅在腿上进行运动,就像它是一个双足机器人一样。模拟的定量结果表明,相对于基线,我们的方法取得了改善,证明了这种方法和每个设计选择的有效性。另一方面,实机实验通过干扰各种地形对策略的鲁棒性进行了定性评估。所有代码、检查点和实机部署指南都将公开发布。
https://arxiv.org/abs/2404.14405
To date, most discoveries of network subcomponents that implement human-interpretable computations in deep vision models have involved close study of single units and large amounts of human labor. We explore scalable methods for extracting the subgraph of a vision model's computational graph that underlies recognition of a specific visual concept. We introduce a new method for identifying these subgraphs: specifying a visual concept using a few examples, and then tracing the interdependence of neuron activations across layers, or their functional connectivity. We find that our approach extracts circuits that causally affect model output, and that editing these circuits can defend large pretrained models from adversarial attacks.
截至目前,大多数在深度视觉模型中实现人类可解释计算的网络子组件的发现都涉及对单个单元的深入研究和对大量人类劳动的充分利用。我们研究可扩展的方法来提取视觉模型计算图的子图,该子图在识别特定视觉概念时起作用。我们引入了一种新的方法来识别这些子图:使用几个示例来指定一个视觉概念,然后跟踪层之间神经元激活之间的相互依赖,或者它们的功能性连接。我们发现,我们的方法提取了因果影响模型输出并能够防御对抗攻击的电路。编辑这些电路可以保护预训练的大型模型免受攻击。
https://arxiv.org/abs/2404.14349
Stance detection has been widely studied as the task of determining if a social media post is positive, negative or neutral towards a specific issue, such as support towards vaccines. Research in stance detection has however often been limited to a single language and, where more than one language has been studied, research has focused on few-shot settings, overlooking the challenges of developing a zero-shot cross-lingual stance detection model. This paper makes the first such effort by introducing a novel approach to zero-shot cross-lingual stance detection, Multilingual Translation-Augmented BERT (MTAB), aiming to enhance the performance of a cross-lingual classifier in the absence of explicit training data for target languages. Our technique employs translation augmentation to improve zero-shot performance and pairs it with adversarial learning to further boost model efficacy. Through experiments on datasets labeled for stance towards vaccines in four languages English, German, French, Italian. We demonstrate the effectiveness of our proposed approach, showcasing improved results in comparison to a strong baseline model as well as ablated versions of our model. Our experiments demonstrate the effectiveness of model components, not least the translation-augmented data as well as the adversarial learning component, to the improved performance of the model. We have made our source code accessible on GitHub.
作为一种确定社交媒体帖子是否支持、反对或中立的特定问题(如对疫苗的支持)的任务,姿态检测(Stance detection)已经受到了广泛研究。然而,姿态检测研究通常局限于一种语言,并且在研究多个语言时,研究重点在于少样本设置,忽略了开发零样本跨语言姿态检测模型的挑战。本文通过引入一种名为多语言翻译增强BERT(MTAB)的新方法,第一次在零样本跨语言姿态检测上做出了尝试,旨在提高在没有明确目标语言训练数据的情况下跨语言分类器的性能。我们的技术采用翻译增强来提高零样本性能,并将其与对抗学习相结合,以进一步提高模型的功效。通过在四个语言(英语、德语、法语、意大利)的数据集上进行实验,我们证明了所提出方法的效力,并将其与强基线模型以及我们模型的衰减版本进行了比较。我们的实验结果表明,模型组件(尤其是翻译增强数据和对抗学习组件)对模型的提高性能具有重要作用。我们在GitHub上公开了我们的源代码。
https://arxiv.org/abs/2404.14339
Recently, diffusion-based purification (DBP) has emerged as a promising approach for defending against adversarial attacks. However, previous studies have used questionable methods to evaluate the robustness of DBP models, their explanations of DBP robustness also lack experimental support. We re-examine DBP robustness using precise gradient, and discuss the impact of stochasticity on DBP robustness. To better explain DBP robustness, we assess DBP robustness under a novel attack setting, Deterministic White-box, and pinpoint stochasticity as the main factor in DBP robustness. Our results suggest that DBP models rely on stochasticity to evade the most effective attack direction, rather than directly countering adversarial perturbations. To improve the robustness of DBP models, we propose Adversarial Denoising Diffusion Training (ADDT). This technique uses Classifier-Guided Perturbation Optimization (CGPO) to generate adversarial perturbation through guidance from a pre-trained classifier, and uses Rank-Based Gaussian Mapping (RBGM) to convert adversarial pertubation into a normal Gaussian distribution. Empirical results show that ADDT improves the robustness of DBP models. Further experiments confirm that ADDT equips DBP models with the ability to directly counter adversarial perturbations.
近年来,扩散基净化(DBP)作为一种对抗性攻击防御方法,受到了越来越多的关注。然而,之前的研究在评估 DBP 模型的鲁棒性时使用了值得怀疑的方法,并且 DBP 模型的抗干扰性解释缺乏实验支持。我们使用精确的梯度重新审视 DBP 的鲁棒性,并讨论随机性对 DBP 鲁棒性的影响。为了更好地解释 DBP 的鲁棒性,我们在新的攻击场景——确定性白盒下评估 DBP 的鲁棒性,并确定随机性是 DBP 鲁棒性的主要因素。我们的结果表明,DBP 模型依赖于随机性来避开最有效的攻击方向,而不是直接对抗 adversarial 扰动。为了提高 DBP 模型的鲁棒性,我们提出了 Adversarial Denoising Diffusion Training(ADDT)。这种技术利用先验分类器指导的扰动优化(CGPO)生成 adversarial 扰动,并使用基于排名的高斯映射(RBGM)将 adversarial 扰动转换为正态高斯分布。实验结果表明,ADDT 改善了 DBP 模型的鲁棒性。进一步的实验证实了 ADDT 为 DBP 模型提供了直接对抗 adversarial 扰动的能力。
https://arxiv.org/abs/2404.14309
Urbanization challenges underscore the necessity for effective satellite image-text retrieval methods to swiftly access specific information enriched with geographic semantics for urban applications. However, existing methods often overlook significant domain gaps across diverse urban landscapes, primarily focusing on enhancing retrieval performance within single domains. To tackle this issue, we present UrbanCross, a new framework for cross-domain satellite image-text retrieval. UrbanCross leverages a high-quality, cross-domain dataset enriched with extensive geo-tags from three countries to highlight domain diversity. It employs the Large Multimodal Model (LMM) for textual refinement and the Segment Anything Model (SAM) for visual augmentation, achieving a fine-grained alignment of images, segments and texts, yielding a 10% improvement in retrieval performance. Additionally, UrbanCross incorporates an adaptive curriculum-based source sampler and a weighted adversarial cross-domain fine-tuning module, progressively enhancing adaptability across various domains. Extensive experiments confirm UrbanCross's superior efficiency in retrieval and adaptation to new urban environments, demonstrating an average performance increase of 15% over its version without domain adaptation mechanisms, effectively bridging the domain gap.
城市化挑战凸显了需要有效的卫星图像-文本检索方法快速访问具有地理语义信息的具体信息以支持城市应用的重要性。然而,现有的方法通常忽视了不同城市景观之间的显著领域差距,主要关注在单一领域提高检索性能。为解决这个问题,我们提出了UrbanCross,一个新的跨领域卫星图像-文本检索框架。UrbanCross利用三个国家高质量、跨领域的数据集,丰富具有地理标签的数据,强调领域多样性。它采用大型多模态模型(LMM)进行文本细化和分而治之模型(SAM)进行视觉增强,实现图像、段落和文本的细粒度对齐,检索性能提高了10%。此外,UrbanCross还引入了一个自适应课程为基础的来源采样和加权对抗跨领域微调模块,在各个领域逐渐增强适应性。大量实验证实了UrbanCross在检索和适应新城市环境方面的优越性,表明在没有领域适应机制的情况下,其版本检索性能提高了15%,有效地弥合了领域差距。
https://arxiv.org/abs/2404.14241
In the domain of differential equation-based generative modeling, conventional approaches often rely on single-dimensional scalar values as interpolation coefficients during both training and inference phases. In this work, we introduce, for the first time, a multidimensional interpolant that extends these coefficients into multiple dimensions, leveraging the stochastic interpolant framework. Additionally, we propose a novel path optimization problem tailored to adaptively determine multidimensional inference trajectories, with a predetermined differential equation solver and a fixed number of function evaluations. Our solution involves simulation dynamics coupled with adversarial training to optimize the inference path. Notably, employing a multidimensional interpolant during training improves the model's inference performance, even in the absence of path optimization. When the adaptive, multidimensional path derived from our optimization process is employed, it yields further performance gains, even with fixed solver configurations. The introduction of multidimensional interpolants not only enhances the efficacy of models but also opens up a new domain for exploration in training and inference methodologies, emphasizing the potential of multidimensional paths as an untapped frontier.
在基于微分方程的生成建模领域,传统方法通常在训练和推理阶段依赖单维度标量值作为插值系数。在本文中,我们首次引入了一个多维插值剂,将这些系数扩展到多个维度,利用随机插值框架。此外,我们提出了一个针对自适应确定多维度推理轨迹的新路径优化问题,具有预先确定的差分方程求解器和固定数量的功能评估。我们的解决方案涉及模拟动态与 adversarial 训练相结合来优化推理路径。值得注意的是,在训练过程中使用多维插值剂可以提高模型的推理性能,即使在没有路径优化的情况下。当采用我们求解过程中的自适应多维度路径时,即使固定了求解器配置,仍可以带来进一步的性能提升。引入多维插值剂不仅增强了模型的有效性,还拓展了训练和推理方法论的新领域,突出了多维度路径作为未开拓前沿的潜在可能性。
https://arxiv.org/abs/2404.14161
The advent of large language models (LLMs) has revolutionized the field of natural language processing, yet they might be attacked to produce harmful content. Despite efforts to ethically align LLMs, these are often fragile and can be circumvented by jailbreaking attacks through optimized or manual adversarial prompts. To address this, we introduce the Information Bottleneck Protector (IBProtector), a defense mechanism grounded in the information bottleneck principle, and we modify the objective to avoid trivial solutions. The IBProtector selectively compresses and perturbs prompts, facilitated by a lightweight and trainable extractor, preserving only essential information for the target LLMs to respond with the expected answer. Moreover, we further consider a situation where the gradient is not visible to be compatible with any LLM. Our empirical evaluations show that IBProtector outperforms current defense methods in mitigating jailbreak attempts, without overly affecting response quality or inference speed. Its effectiveness and adaptability across various attack methods and target LLMs underscore the potential of IBProtector as a novel, transferable defense that bolsters the security of LLMs without requiring modifications to the underlying models.
大语言模型的出现已经颠覆了自然语言处理领域,然而它们可能遭到攻击以产生有害的内容。尽管努力使LLMs具有伦理意义,但它们通常很脆弱,可以通过利用优化或手动攻击提示绕过。为了应对这个问题,我们引入了信息瓶颈保护器(IBProtector),这是一种基于信息瓶颈原理的防御机制,并修改了目标,以避免琐碎的解决方案。IBProtector选择性地压缩和扰动提示,得益于轻量级且可训练的提取器,保留仅对目标LLM具有关键信息的 essential information。此外,我们进一步考虑了一种情况,即梯度不可见,以兼容任何LLM。我们的实证评估结果表明,IBProtector在减轻攻击尝试方面优于现有防御方法,而不会过分影响响应质量和推理速度。它在不同攻击方法和目标LLM上的有效性和可适应性表明,IBProtector作为一种新颖、可转移的防御,有助于加强LLM的安全性,而无需对底层模型进行修改。
https://arxiv.org/abs/2404.13968