Existing defense methods against adversarial attacks can be categorized into training time and test time defenses. Training time defense, i.e., adversarial training, requires a significant amount of extra time for training and is often not able to be generalized to unseen attacks. On the other hand, test time defense by test time weight adaptation requires access to perform gradient descent on (part of) the model weights, which could be infeasible for models with frozen weights. To address these challenges, we propose DRAM, a novel defense method to Detect and Reconstruct multiple types of Adversarial attacks via Masked autoencoder (MAE). We demonstrate how to use MAE losses to build a KS-test to detect adversarial attacks. Moreover, the MAE losses can be used to repair adversarial samples from unseen attack types. In this sense, DRAM neither requires model weight updates in test time nor augments the training set with more adversarial samples. Evaluating DRAM on the large-scale ImageNet data, we achieve the best detection rate of 82% on average on eight types of adversarial attacks compared with other detection baselines. For reconstruction, DRAM improves the robust accuracy by 6% ~ 41% for Standard ResNet50 and 3% ~ 8% for Robust ResNet50 compared with other self-supervision tasks, such as rotation prediction and contrastive learning.
现有的防反欺诈方法可以分为两种:训练时间和测试时间防御。训练时间防御(也称为反欺诈训练)需要额外的训练时间,并且通常无法应用于未观察到的攻击。测试时间防御(也称为测试时间权重适应)需要访问模型权重的一部分进行梯度下降,这对于具有冻结权重的模型来说是可行的。为了解决这些挑战,我们提出了DRAM,一种 novel 防御方法,通过掩码自编码器(MAE)来检测和重构多种类型的反欺诈攻击。我们展示了如何使用MAE损失构建 KS-测试来检测反欺诈攻击。此外,MAE损失还可以用于修复未观察到攻击类型的反欺诈样本。因此,DRAM在测试时间内不需要模型权重更新,也不会增加训练集中的更多反欺诈样本。在评估DRAM的大型图像集数据上,我们平均实现了82%的反欺诈攻击检测率,相比其他检测基准线。对于重构,DRAM标准ResNet50的鲁棒精度提高了6%至41%,而 robust ResNet50的精度提高了3%至8%。与旋转预测和对比学习等其他自监督任务相比,DRAM实现了更好的鲁棒性精度。
https://arxiv.org/abs/2303.12848
Adversarial examples are crafted by adding indistinguishable perturbations to normal examples in order to fool a well-trained deep learning model to misclassify. In the context of computer vision, this notion of indistinguishability is typically bounded by $L_{\infty}$ or other norms. However, these norms are not appropriate for measuring indistinguishiability for time series data. In this work, we propose adversarial examples in the Wasserstein space for time series data for the first time and utilize Wasserstein distance to bound the perturbation between normal examples and adversarial examples. We introduce Wasserstein projected gradient descent (WPGD), an adversarial attack method for perturbing univariant time series data. We leverage the closed-form solution of Wasserstein distance in the 1D space to calculate the projection step of WPGD efficiently with the gradient descent method. We further propose a two-step projection so that the search of adversarial examples in the Wasserstein space is guided and constrained by Euclidean norms to yield more effective and imperceptible perturbations. We empirically evaluate the proposed attack on several time series datasets in the healthcare domain. Extensive results demonstrate that the Wasserstein attack is powerful and can successfully attack most of the target classifiers with a high attack success rate. To better study the nature of Wasserstein adversarial example, we evaluate a strong defense mechanism named Wasserstein smoothing for potential certified robustness defense. Although the defense can achieve some accuracy gain, it still has limitations in many cases and leaves space for developing a stronger certified robustness method to Wasserstein adversarial examples on univariant time series data.
对抗性例子是通过将正常例子添加相同的扰动来欺骗经过训练的深度学习模型而制造出来的。在计算机视觉的背景下,这种不可区分的特性通常被 $L_{\infty}$ 或其他 norms 所限制。然而,这些 norms 不适合用于衡量时间序列数据的可区分性。在这项工作中,我们首次提议在瓦塞尔空间中对时间序列数据提出对抗性例子,并利用瓦塞尔距离来限制正常例子和对抗性例子之间的扰动。我们介绍了瓦塞尔投影梯度下降(WPGD)攻击方法,用于扰动不变型时间序列数据。我们利用 1D 空间的瓦塞尔距离的 closed-form 解决方案,通过梯度下降方法高效计算瓦塞尔投影梯度下降步骤。我们还提出了一种两步投影方法,以便在欧几里得范数的指导下和限制中查找对抗性例子,以产生更有效且难以区分的扰动。我们经验证地评估了提议的攻击对几个医疗领域时间序列数据集的效果。广泛的结果表明,瓦塞尔攻击非常强大,能够成功地攻击大多数目标分类器,并具有高攻击成功率。为了更好地研究瓦塞尔对抗性例子的特性,我们评估了一种名为瓦塞尔平滑的强大防御机制,以潜在认证鲁棒防御。虽然防御可以实现一些精度增益,但它在许多情况下仍然具有限制,并留下了开发更强大的认证鲁棒方法瓦塞尔对抗性例子 univariant 时间序列数据的空间。
https://arxiv.org/abs/2303.12357
Sionna is a GPU-accelerated open-source library for link-level simulations based on TensorFlow. Its latest release (v0.14) integrates a differentiable ray tracer (RT) for the simulation of radio wave propagation. This unique feature allows for the computation of gradients of the channel impulse response and other related quantities with respect to many system and environment parameters, such as material properties, antenna patterns, array geometries, as well as transmitter and receiver orientations and positions. In this paper, we outline the key components of Sionna RT and showcase example applications such as learning radio materials and optimizing transmitter orientations by gradient descent. While classic ray tracing is a crucial tool for 6G research topics like reconfigurable intelligent surfaces, integrated sensing and communications, as well as user localization, differentiable ray tracing is a key enabler for many novel and exciting research directions, for example, digital twins.
Sinomax是一个基于TensorFlow的链级模拟开源库,具有GPU加速功能。其最新版本(v0.14)集成了可区分的光线追踪(RT)功能,用于模拟无线电传播。这个独特特性允许对通道脉冲响应和其他相关量的计算梯度,与许多系统和环境参数,例如材料特性、天线模式、阵型几何学以及发射器和接收器的方向和位置进行计算。在本文中,我们概述了Sinomax RT的关键组件,并展示了示例应用,例如学习无线电材料和应用梯度下降优化发射器方向。虽然经典光线追踪对于像可重构智能表面、集成传感和通信以及用户定位等6G研究主题是至关重要的工具,但可区分的光线追踪为许多新颖和令人兴奋的研究方向,例如数字双胞胎等提供了关键 enabler。
https://arxiv.org/abs/2303.11103
Differentiable logics (DL) have recently been proposed as a method of training neural networks to satisfy logical specifications. A DL consists of a syntax in which specifications are stated and an interpretation function that translates expressions in the syntax into loss functions. These loss functions can then be used during training with standard gradient descent algorithms. The variety of existing DLs and the differing levels of formality with which they are treated makes a systematic comparative study of their properties and implementations difficult. This paper remedies this problem by suggesting a meta-language for defining DLs that we call the Logic of Differentiable Logics, or LDL. Syntactically, it generalises the syntax of existing DLs to FOL, and for the first time introduces the formalism for reasoning about vectors and learners. Semantically, it introduces a general interpretation function that can be instantiated to define loss functions arising from different existing DLs. We use LDL to establish several theoretical properties of existing DLs, and to conduct their empirical study in neural network verification.
可分逻辑(DL)最近被提出作为一种训练神经网络以满足逻辑规范的方法和工具。一个DL由表示性语法组成,其中定义被陈述,并且解释函数将语法中的表达式转换为损失函数。这些损失函数可以在标准梯度下降算法训练中使用。现有的DL的多样性以及它们被处理的不同程度使得对他们的性质和实现进行系统比较研究很困难。本文解决这个问题,提出了一种称为“不同分逻辑逻辑”的元语言来定义DL,我们称之为“不同分逻辑逻辑语言”(LDL)。在符号学上,它普遍化了现有的DL的语法,并第一次引入了对向量和学习器推理的表示性形式。在语义学上,它引入了一种通用的解释函数,可以实例化来定义来自不同现有DL的不同损失函数。我们使用LDL建立了现有的DL的几个理论性质,并在神经网络验证中进行它们的实证研究。
https://arxiv.org/abs/2303.10650
Both Byzantine resilience and communication efficiency have attracted tremendous attention recently for their significance in edge federated learning. However, most existing algorithms may fail when dealing with real-world irregular data that behaves in a heavy-tailed manner. To address this issue, we study the stochastic convex and non-convex optimization problem for federated learning at edge and show how to handle heavy-tailed data while retaining the Byzantine resilience, communication efficiency and the optimal statistical error rates simultaneously. Specifically, we first present a Byzantine-resilient distributed gradient descent algorithm that can handle the heavy-tailed data and meanwhile converge under the standard assumptions. To reduce the communication overhead, we further propose another algorithm that incorporates gradient compression techniques to save communication costs during the learning process. Theoretical analysis shows that our algorithms achieve order-optimal statistical error rate in presence of Byzantine devices. Finally, we conduct extensive experiments on both synthetic and real-world datasets to verify the efficacy of our algorithms.
拜占庭容错性和通信效率最近在边缘联邦学习的重要性得到了巨大的关注。然而,大多数现有算法在处理具有浓厚 tail 行为的实际不规则数据时会失败。为了解决这一问题,我们对边缘联邦学习中的圣占庭容错性和非凸优化问题进行研究,并展示了如何在保持拜占庭容错性、通信效率和最优统计错误率的同时处理浓厚 tail 数据。具体而言,我们首先提出了一种拜占庭容错性的分布式梯度下降算法,能够处理浓厚 tail 数据,并在标准假设下收敛。为了减少通信 overhead,我们进一步提出了一种包含梯度压缩技术的算法,在学习过程中提取通信成本。理论分析表明,在我们算法存在拜占庭设备的情况下,可以实现 order- optimal 的统计错误率。最后,我们对合成数据和实际数据集进行了广泛的实验,以验证我们算法的有效性。
https://arxiv.org/abs/2303.10434
With the development of adversarial attacks, adversairal examples have been widely used to enhance the robustness of the training models on deep neural networks. Although considerable efforts of adversarial attacks on improving the transferability of adversarial examples have been developed, the attack success rate of the transfer-based attacks on the surrogate model is much higher than that on victim model under the low attack strength (e.g., the attack strength $\epsilon=8/255$). In this paper, we first systematically investigated this issue and found that the enormous difference of attack success rates between the surrogate model and victim model is caused by the existence of a special area (known as fuzzy domain in our paper), in which the adversarial examples in the area are classified wrongly by the surrogate model while correctly by the victim model. Then, to eliminate such enormous difference of attack success rates for improving the transferability of generated adversarial examples, a fuzziness-tuned method consisting of confidence scaling mechanism and temperature scaling mechanism is proposed to ensure the generated adversarial examples can effectively skip out of the fuzzy domain. The confidence scaling mechanism and the temperature scaling mechanism can collaboratively tune the fuzziness of the generated adversarial examples through adjusting the gradient descent weight of fuzziness and stabilizing the update direction, respectively. Specifically, the proposed fuzziness-tuned method can be effectively integrated with existing adversarial attacks to further improve the transferability of adverarial examples without changing the time complexity. Extensive experiments demonstrated that fuzziness-tuned method can effectively enhance the transferability of adversarial examples in the latest transfer-based attacks.
随着dversarial攻击的发展,dversarial例子被广泛用于增强深度学习模型的训练稳定性。虽然dversarial攻击为了改善转移性而付出了很大的努力,但基于代用模型的转移攻击的攻击成功率在低攻击强度下比受害者模型高得多(例如,攻击强度$\epsilon=8/255$)。在本文中,我们首先系统研究了这个问题,发现代用模型和受害者模型之间的攻击成功率的巨大差异是由一个特殊区域的存在引起的,这个区域被称为模糊领域(在我们的文章中称为模糊域),在这个区域内,代用模型将该区域中的dversarial例子错误地分类,而受害者模型正确地分类。为了消除这种攻击成功率的差异,以信任度量机制和温度度量机制为基础的模糊调整方法被提出,以确保生成的dversarial例子能够有效地从模糊领域跳过。信任度量机制和温度度量机制可以通过调整模糊梯度大小和稳定更新方向来协同调整生成的dversarial例子的模糊度。具体来说,提出的模糊调整方法可以有效地与现有的dversarial攻击系统集成,以进一步改善dversarial例子的转移性,而无需改变时间复杂度。广泛的实验表明,模糊调整方法可以 effectively增强最新的基于转移攻击的dversarial例子的转移性。
https://arxiv.org/abs/2303.10078
We propose "collision cross-entropy" as a robust alternative to the Shannon's cross-entropy in the context of self-labeled classification with posterior models. Assuming unlabeled data, self-labeling works by estimating latent pseudo-labels, categorical distributions y, that optimize some discriminative clustering criteria, e.g. "decisiveness" and "fairness". All existing self-labeled losses incorporate Shannon's cross-entropy term targeting the model prediction, softmax, at the estimated distribution y. In fact, softmax is trained to mimic the uncertainty in y exactly. Instead, we propose the negative log-likelihood of "collision" to maximize the probability of equality between two random variables represented by distributions softmax and y. We show that our loss satisfies some properties of a generalized cross-entropy. Interestingly, it agrees with the Shannon's cross-entropy for one-hot pseudo-labels y, but the training from softer labels weakens. For example, if y is a uniform distribution at some data point, it has zero contribution to the training. Our self-labeling loss combining collision cross entropy with basic clustering criteria is convex w.r.t. pseudo-labels, but non-trivial to optimize over the probability simplex. We derive a practical EM algorithm optimizing pseudo-labels y significantly faster than generic methods, e.g. the projectile gradient descent. The collision cross-entropy consistently improves the results on multiple self-labeled clustering examples using different DNNs.
我们提出了“碰撞交叉熵”作为稳健 alternative 以 posterior 模型自我标注分类为背景的 Shannon 交叉熵。假设未标记数据,自我标注通过估计潜在的伪标签 y,分类类别分布,来优化一些特定的分类聚类标准,例如“决定性”和“公平性”。所有现有的自我标注损失都包括 Shannon 交叉熵 term 以针对估计分布 y 的目标模型预测,softmax 计算。实际上,softmax 训练旨在模拟 y 中的不确定度。相反,我们提议“碰撞”负对数似然率来最大化由分布 softmax 和 y 代表的两个随机变量之间的相等概率。我们表明,我们的损失满足一些通用交叉熵的特性。有趣的是,它对 y 的一Hot伪标签的 Shannon 交叉熵一致同意,但较软标签的训练削弱了。例如,如果 y 在某种数据点是一个均匀分布,它没有对训练的贡献。我们的自我标注损失将碰撞交叉熵与基本分类聚类标准结合,是凸的对伪标签而言,但优化概率简谐平面的难度很大。我们推导了一个实用的 EMD 算法优化伪标签 y 的速度比通用方法,例如 Projectile Gradient Descent 更快。碰撞交叉熵使用不同的 DNN 对多个自我标注聚类示例 consistently 提高了结果。
https://arxiv.org/abs/2303.07321
Fractals are geometric shapes that can display complex and self-similar patterns found in nature (e.g., clouds and plants). Recent works in visual recognition have leveraged this property to create random fractal images for model pre-training. In this paper, we study the inverse problem -- given a target image (not necessarily a fractal), we aim to generate a fractal image that looks like it. We propose a novel approach that learns the parameters underlying a fractal image via gradient descent. We show that our approach can find fractal parameters of high visual quality and be compatible with different loss functions, opening up several potentials, e.g., learning fractals for downstream tasks, scientific understanding, etc.
Fractals 是几何形状,可以在自然中发现复杂的、自相似的模式(如云和植物)。最近在视觉识别领域的工作利用了这个特性,创造了随机的Fractal图像用于模型预训练。在本文中,我们研究逆问题——给定一个目标图像(不一定是一个Fractal),我们旨在生成它看起来像的图像。我们提出了一种新方法,通过梯度下降学习Fractal图像背后的参数。我们证明,我们的方法可以找到视觉质量高的Fractal参数,并与不同类型的损失函数兼容,开启了多个潜力,例如学习Fractal用于后续任务、科学理解等。
https://arxiv.org/abs/2303.12722
Most of the previous methods for table recognition rely on training datasets containing many richly annotated table images. Detailed table image annotation, e.g., cell or text bounding box annotation, however, is costly and often subjective. In this paper, we propose a weakly supervised model named WSTabNet for table recognition that relies only on HTML (or LaTeX) code-level annotations of table images. The proposed model consists of three main parts: an encoder for feature extraction, a structure decoder for generating table structure, and a cell decoder for predicting the content of each cell in the table. Our system is trained end-to-end by stochastic gradient descent algorithms, requiring only table images and their ground-truth HTML (or LaTeX) representations. To facilitate table recognition with deep learning, we create and release WikiTableSet, the largest publicly available image-based table recognition dataset built from Wikipedia. WikiTableSet contains nearly 4 million English table images, 590K Japanese table images, and 640k French table images with corresponding HTML representation and cell bounding boxes. The extensive experiments on WikiTableSet and two large-scale datasets: FinTabNet and PubTabNet demonstrate that the proposed weakly supervised model achieves better, or similar accuracies compared to the state-of-the-art models on all benchmark datasets.
以往的表格识别方法大多数依赖于包含大量 richly annotate table 图像的训练数据集。但详细的 table 图像标注,例如单元或文本框标注,通常是昂贵的,且往往主观。在本文中,我们提出了一种弱监督模型,名为 WSTabNet,用于表格识别,它仅依赖于 table 图像的 HTML(或 LaTeX)代码级别的标注。该模型由三个主要部分组成:特征提取编码器,结构生成解码器,以及单元解码器,用于预测表格每个单元的内容。我们的系统通过随机梯度下降算法进行了全面训练,只需要 table 图像及其真值 HTML(或 LaTeX)表示。为了促进深度学习中的表格识别,我们创建了并发布了 WikiTableSet,这是从维基百科构建的最大的公开可用的图像based 表格识别数据集。WikiTableSet 包含近 4 百万个英语表格图像、590 千个日本表格图像和640 千个法语表格图像,并具有相应的 HTML 表示和单元框限定符。在 WikiTableSet 和两个大规模数据集:FinTabNet 和PubTabNet 的广泛实验中,证明了所述弱监督模型在所有基准数据集上比最先进的模型 achieve 更好或类似的精度。
https://arxiv.org/abs/2303.07641
With the inspiration of vision transformers, the concept of depth-wise convolution revisits to provide a large Effective Receptive Field (ERF) using Large Kernel (LK) sizes for medical image segmentation. However, the segmentation performance might be saturated and even degraded as the kernel sizes scaled up (e.g., $21\times 21\times 21$) in a Convolutional Neural Network (CNN). We hypothesize that convolution with LK sizes is limited to maintain an optimal convergence for locality learning. While Structural Re-parameterization (SR) enhances the local convergence with small kernels in parallel, optimal small kernel branches may hinder the computational efficiency for training. In this work, we propose RepUX-Net, a pure CNN architecture with a simple large kernel block design, which competes favorably with current network state-of-the-art (SOTA) (e.g., 3D UX-Net, SwinUNETR) using 6 challenging public datasets. We derive an equivalency between kernel re-parameterization and the branch-wise variation in kernel convergence. Inspired by the spatial frequency in the human visual system, we extend to vary the kernel convergence into element-wise setting and model the spatial frequency as a Bayesian prior to re-parameterize convolutional weights during training. Specifically, a reciprocal function is leveraged to estimate a frequency-weighted value, which rescales the corresponding kernel element for stochastic gradient descent. From the experimental results, RepUX-Net consistently outperforms 3D SOTA benchmarks with internal validation (FLARE: 0.929 to 0.944), external validation (MSD: 0.901 to 0.932, KiTS: 0.815 to 0.847, LiTS: 0.933 to 0.949, TCIA: 0.736 to 0.779) and transfer learning (AMOS: 0.880 to 0.911) scenarios in Dice Score.
受到视觉转换启发,我们重写了深度卷积的概念,以提供使用大内核(LK)大小进行医学图像分割的有效接收场(ERF)。然而,随着内核大小的增加,分割性能可能会饱和甚至退化(例如,21x21x21的卷积神经网络(CNN)的内核大小为21x21x21)。我们假设卷积与LK大小的限制是为了维持局部学习的最佳收敛。虽然结构参数化(SR)可以提高并行的小内核收敛性能,但最优的小内核分支可能会妨碍训练的计算效率。在本研究中,我们提出了RepUX-Net,这是一个纯卷积架构,具有简单的大内核块设计,它使用6个具有挑战性的公共数据集竞争当前网络最先进的(SOTA)水平(例如,3D UX-Net和SwinuneTR)。我们推导了内核参数化与分支变化之间的等价性。受人类视觉系统的空间频率启发,我们将内核收敛设置为元素级别设置,并模型空间频率作为 Bayesian 先验,在训练期间重新参数化卷积权重。具体而言,使用反相函数,可以估计一个频率加权值,该值重新缩放相应的内核元素以随机梯度下降。从实验结果中,RepUX-Net consistently outperforms 3D SOTA基准(FLARE:0.929至0.944,MSD:0.901至0.932,KiTS:0.815至0.847,LiTS:0.933至0.949,TCIA:0.736至0.779)和转移学习(AMOS:0.880至0.911)场景的Dice得分。
https://arxiv.org/abs/2303.05785
Eye tracking is an important tool with a wide range of applications in Virtual, Augmented, and Mixed Reality (VR/AR/MR) technologies. State-of-the-art eye tracking methods are either reflection-based and track reflections of sparse point light sources, or image-based and exploit 2D features of the acquired eye image. In this work, we attempt to significantly improve reflection-based methods by utilizing pixel-dense deflectometric surface measurements in combination with optimization-based inverse rendering algorithms. Utilizing the known geometry of our deflectometric setup, we develop a differentiable rendering pipeline based on PyTorch3D that simulates a virtual eye under screen illumination. Eventually, we exploit the image-screen-correspondence information from the captured measurements to find the eye's rotation, translation, and shape parameters with our renderer via gradient descent. In general, our method does not require a specific pattern and can work with ordinary video frames of the main VR/AR/MR screen itself. We demonstrate real-world experiments with evaluated mean relative gaze errors below 0.45 degrees at a precision better than 0.11 degrees. Moreover, we show an improvement of 6X over a representative reflection-based state-of-the-art method in simulation.
眼跟踪是一种在虚拟、增强和混合现实技术(VR/AR/MR)中广泛应用的重要工具。目前最先进的眼跟踪方法通常是基于反射的,并通过优化的逆渲染算法跟踪稀疏光源的反射。在本文中,我们尝试通过使用像素密集的反射测量方法和基于优化的反向渲染算法的组合来显著提高反射方法的性能。利用我们的反射测量设备的已知几何形状,我们基于PyTorch3D开发了一款不同的渲染管道,该管道基于模拟屏幕照明的梯度下降算法,模拟了一个虚拟眼睛在屏幕上的视图。最终,我们利用捕获的测量图像上的屏幕对应信息,通过卷积 descent 方法找到眼睛的旋转、 Translation 和形状参数,我们的渲染器将这些参数与我们的算法进行匹配。通常情况下,我们的方法不需要特定的图案,可以与主VR/AR/MR屏幕的普通视频帧工作。我们演示了评估的平均值相对 gaze 误差小于0.45度、精度大于0.11度的现实世界实验。此外,我们在模拟中展示了比代表性反射方法改进了6X的性能。
https://arxiv.org/abs/2303.04997
In recent years, contrastive learning achieves impressive results on self-supervised visual representation learning, but there still lacks a rigorous understanding of its learning dynamics. In this paper, we show that if we cast a contrastive objective equivalently into the feature space, then its learning dynamics admits an interpretable form. Specifically, we show that its gradient descent corresponds to a specific message passing scheme on the corresponding augmentation graph. Based on this perspective, we theoretically characterize how contrastive learning gradually learns discriminative features with the alignment update and the uniformity update. Meanwhile, this perspective also establishes an intriguing connection between contrastive learning and Message Passing Graph Neural Networks (MP-GNNs). This connection not only provides a unified understanding of many techniques independently developed in each community, but also enables us to borrow techniques from MP-GNNs to design new contrastive learning variants, such as graph attention, graph rewiring, jumpy knowledge techniques, etc. We believe that our message passing perspective not only provides a new theoretical understanding of contrastive learning dynamics, but also bridges the two seemingly independent areas together, which could inspire more interleaving studies to benefit from each other. The code is available at this https URL.
过去几年中,对比学习在自监督的视觉表示学习方面取得了令人印象深刻的结果,但对其学习动态仍缺乏严格的理解。在本文中,我们表明,如果我们将对比目标同样地映射到特征空间中,则其学习动态可以具有可解释的形式。具体地,我们表明其梯度下降对应于在相应的扩展图中特定的消息传递方案。基于这一视角,我们从理论上 Characterize 对比学习逐渐学习区分特征的过程,伴随着对齐更新和一致性更新。同时,这一视角还建立了令人感兴趣的对比学习和消息传递 Graph 神经网络(MP-GNNs)之间的神秘联系。这一联系不仅提供了每个社区独立开发的许多技巧的一致理解,还使我们可以从 MP-GNNs 中借用技巧来设计新的对比学习变体,例如图注意力、图重新布线、跳跃知识技巧等。我们相信,我们的消息传递视角不仅提供了对比学习动态的新理论理解,还连接了两个看似独立的领域,这可以激励更多的交叉研究从彼此中受益。代码可在该 https URL 中获取。
https://arxiv.org/abs/2303.04435
We study optimal transport-based distributionally robust optimization problems where a fictitious adversary, often envisioned as nature, can choose the distribution of the uncertain problem parameters by reshaping a prescribed reference distribution at a finite transportation cost. In this framework, we show that robustification is intimately related to various forms of variation and Lipschitz regularization even if the transportation cost function fails to be (some power of) a metric. We also derive conditions for the existence and the computability of a Nash equilibrium between the decision-maker and nature, and we demonstrate numerically that nature's Nash strategy can be viewed as a distribution that is supported on remarkably deceptive adversarial samples. Finally, we identify practically relevant classes of optimal transport-based distributionally robust optimization problems that can be addressed with efficient gradient descent algorithms even if the loss function or the transportation cost function are nonconvex (but not both at the same time).
我们对最优基于运输的分布鲁棒优化问题进行研究,其中虚构的竞争对手通常被视为自然,可以通过在有限运输成本下改变指定的参考分布来选择不确定问题参数的分布。在这个框架中,我们表明,鲁棒化与各种形式的变化和Lipschitz正则化密切相关,即使运输成本函数可能不是(某些方面的)度量函数。我们还推导了决策人和自然的纳什均衡的存在和计算条件,并的数值方法表明,自然的纳什策略可以被视为支持令人惊讶的欺骗性对抗样本的分布。最后,我们识别了实践中相关的最优基于运输的分布鲁棒优化问题类,可以使用高效的梯度下降算法,即使损失函数或运输成本函数不是凸的(但不是同时凸)。
https://arxiv.org/abs/2303.03900
Topological magnetic textures observed in experiments can, in principle, be predicted by theoretical calculations and numerical simulations. However, such calculations are, in general, hampered by difficulties in distinguishing between local and global energy minima. This becomes particularly problematic for magnetic materials that allow for a multitude of topological charges. Finding solutions to such problems by means of classical numerical methods can be challenging because either a good initial guess or a gigantic amount of random sampling is required. In this study, we demonstrate an efficient way to identify those metastable configurations by leveraging the power of gradient descent-based optimization within the framework of a feedforward neural network combined with a heuristic meta-search, which is driven by a random perturbation of the neural network's input. We exemplify the power of the method by an analysis of the Pd/Fe/Ir(111) system, an experimentally well characterized system.
实验中观察到的拓扑磁性纹理理论上可以通过理论计算和数值模拟预测。然而,这种计算通常受到区分局部和全局能量最小值的困难的困扰。这对允许拓扑电荷丰富的磁性材料特别有问题。通过使用传统的数值方法解决这些问题可能会非常具有挑战性,因为需要一个好的初始猜测或者需要大量的随机采样。在本研究中,我们展示了一种有效的方法,通过利用梯度下降based优化的力量,在循环神经网络与启发式搜索框架内利用,由神经网络输入的随机扰动驱动。我们通过分析 Pd/Fe/Ir(111) 系统为例展示了这种方法的力量。
https://arxiv.org/abs/2303.02876
Causal abstraction is a promising theoretical framework for explainable artificial intelligence that defines when an interpretable high-level causal model is a faithful simplification of a low-level deep learning system. However, existing causal abstraction methods have two major limitations: they require a brute-force search over alignments between the high-level model and the low-level one, and they presuppose that variables in the high-level model will align with disjoint sets of neurons in the low-level one. In this paper, we present distributed alignment search (DAS), which overcomes these limitations. In DAS, we find the alignment between high-level and low-level models using gradient descent rather than conducting a brute-force search, and we allow individual neurons to play multiple distinct roles by analyzing representations in non-standard bases-distributed representations. Our experiments show that DAS can discover internal structure that prior approaches miss. Overall, DAS removes previous obstacles to conducting causal abstraction analyses and allows us to find conceptual structure in trained neural nets.
因果关系抽象是一种有希望的可解释人工智能的理论框架,它定义了当一个可解释的高级因果关系模型是低层次的深度学习系统的准确简化时。然而,现有的因果关系抽象方法有两个主要限制:它们需要对高级别模型和低级别模型之间的对齐进行直观的搜索,并且它们假设高级别模型中的变量将与低级别模型中的离散神经元组成的组合对齐。在本文中,我们介绍了分布式对齐搜索(DAS),它克服了这些限制。在DAS中,我们使用梯度下降而不是直观的搜索来找到高级别和低级别模型之间的对齐,并且通过分析非标准基分布表示中的表示,我们可以允许每个神经元扮演多个不同的角色,从而使个体神经元能够扮演多个不同的角色。我们的实验结果表明,DAS可以发现之前方法所错过的内部结构。总的来说,DAS消除了进行因果关系抽象分析之前所面临的障碍,使我们能够在训练神经网络中查找概念结构。
https://arxiv.org/abs/2303.02536
Our theoretical understanding of the inner workings of general convolutional neural networks (CNN) is limited. We here present a new stepping stone towards such understanding in the form of a theory of learning in linear CNNs. By analyzing the gradient descent equations, we discover that using convolutions leads to a mismatch between the dataset structure and the network structure. We show that linear CNNs discover the statistical structure of the dataset with non-linear, stage-like transitions, and that the speed of discovery changes depending on this structural mismatch. Moreover, we find that the mismatch lies at the heart of what we call the 'dominant frequency bias', where linear CNNs arrive at these discoveries using only the dominant frequencies of the different structural parts present in the dataset. Our findings can help explain several characteristics of general CNNs, such as their shortcut learning and their tendency to rely on texture instead of shape.
我们的理论理解对于通用卷积神经网络(CNN)的内部运作机制是有限的。我们在这里提出了一种新的脚步石,即线性CNN学习的理论研究。通过分析梯度下降方程,我们发现使用卷积会导致数据集结构和网络结构之间的不匹配。我们表明,线性CNN可以通过非线性、阶段式的transition发现数据集的统计学结构,并且发现发现速度取决于这种结构不匹配。此外,我们发现不匹配位于我们称之为“主导频率偏见”的核心,线性CNN使用数据集的不同结构部分的主要频率来发现这些发现。我们的发现可以帮助解释通用CNN的一些特性,例如它们的捷径学习以及它们倾向于依赖纹理而不是形状。
https://arxiv.org/abs/2303.02034
Machine Learning (ML) architectures have been applied to several applications that involve sensitive data, where a guarantee of users' data privacy is required. Differentially Private Stochastic Gradient Descent (DPSGD) is the state-of-the-art method to train privacy-preserving models. However, DPSGD comes at a considerable accuracy loss leading to sub-optimal privacy/utility trade-offs. Towards investigating new ground for better privacy-utility trade-off, this work questions; (i) if models' hyperparameters have any inherent impact on ML models' privacy-preserving properties, and (ii) if models' hyperparameters have any impact on the privacy/utility trade-off of differentially private models. We propose a comprehensive design space exploration of different hyperparameters such as the choice of activation functions, the learning rate and the use of batch normalization. Interestingly, we found that utility can be improved by using Bounded RELU as activation functions with the same privacy-preserving characteristics. With a drop-in replacement of the activation function, we achieve new state-of-the-art accuracy on MNIST (96.02\%), FashionMnist (84.76\%), and CIFAR-10 (44.42\%) without any modification of the learning procedure fundamentals of DPSGD.
机器学习(ML)架构已经被应用于多个涉及敏感数据的应用中,需要保证用户数据隐私。差分隐私随机梯度下降(DPSGD)是训练保持隐私的模型的最新方法。然而,DPSGD带来了相当准确的损失,导致不完美的隐私/功能权衡。为了探索更好的隐私/功能权衡,这项工作提出了一些问题:(i) 模型的超参数是否有任何对机器学习模型保持隐私的属性的固有影响,(ii) 模型的超参数是否有任何对差分隐私模型的隐私/功能权衡的影响。我们建议对不同的超参数(如激活函数的选择、学习率和批量归一化的使用)进行全面的设计空间探索。有趣的是,我们发现,使用具有相同隐私保持性质的Bounded RELU激活函数可以提高 utility。通过直接替换激活函数,我们在MNIST(96.02%)、 fashionMNIST(84.76%)和CIFAR-10(44.42%)等数据集上实现了新的最先进的准确性,而无需改变DPSGD学习程序基本框架。
https://arxiv.org/abs/2303.01819
This dissertation is devoted to provide advanced nonconvex nonsmooth variational models of (Magnetic Resonance Image) MRI reconstruction, efficient learnable image reconstruction algorithms and parameter training algorithms that improve the accuracy and robustness of the optimization-based deep learning methods for compressed sensing MRI reconstruction and synthesis. The first part introduces a novel optimization based deep neural network whose architecture is inspired by proximal gradient descent for solving a variational model. The second part is a substantial extension of the preliminary work in the first part by solving the calibration-free fast pMRI reconstruction problem in a discrete-time optimal control framework. The third part aims at developing a generalizable Magnetic Resonance Imaging (MRI) reconstruction method in the meta-learning framework. The last part aims to synthesize target modality of MRI by using partially scanned k-space data from source modalities instead of fully scanned data that is used in the state-of-the-art multimodal synthesis.
这个学位论文致力于提供先进的非凸非平滑 variational 模型(磁共振成像重建),高效的可学习的图像重建算法和参数训练算法,以改进基于优化的深度学习方法,用于压缩感知磁共振成像重建和合成的精度和鲁棒性。第一部分介绍了一种基于优化的新深度学习神经网络架构,其设计灵感来自于近邻梯度下降解决 variational 模型。第二部分是第一部分的实质性扩展,通过解决无校准快速 pMRI 重建问题,在离散时间最优控制框架中解决了校准问题。第三部分旨在在元学习框架中开发一种通用的磁共振成像 (MRI) 重建方法。最后一部分旨在通过使用从源模式中partially扫描的 k-空间数据,而不是用于现代多模式合成中完整的扫描数据,合成目标模式磁共振成像。
https://arxiv.org/abs/2303.01515
Differentially private stochastic gradient descent privatizes model training by injecting noise into each iteration, where the noise magnitude increases with the number of model parameters. Recent works suggest that we can reduce the noise by leveraging public data for private machine learning, by projecting gradients onto a subspace prescribed by the public data. However, given a choice of public datasets, it is not a priori clear which one may be most appropriate for the private task. We give an algorithm for selecting a public dataset by measuring a low-dimensional subspace distance between gradients of the public and private examples. We provide theoretical analysis demonstrating that the excess risk scales with this subspace distance. This distance is easy to compute and robust to modifications in the setting. Empirical evaluation shows that trained model accuracy is monotone in this distance.
非对称加密随机梯度下降将模型训练私有化,通过在每个迭代中注入噪声来实现,其中噪声强度随着模型参数的数量增加而增加。最近的研究表明,我们可以利用公共数据进行私有机器学习,通过将梯度投影到公共数据指定的高度维空间中,从而减少噪声。然而,给定公共数据集的选择,并非一开始就明确哪种公共数据集最适合私人任务。我们提供了一种算法,通过测量公共和私有示例梯度之间的低维度子空间距离来选择公共数据集。我们提供了理论分析,证明超风险与该子空间距离成正比。这个距离易于计算,并且对设置中的更改具有鲁棒性。经验评估表明,训练模型的准确性在这个距离中呈单调递增趋势。
https://arxiv.org/abs/2303.01256
As machine learning models, specifically neural networks, are becoming increasingly popular, there are concerns regarding their trustworthiness, specially in safety-critical applications, e.g. actions of an autonomous vehicle must be safe. There are approaches that can train neural networks where such domain requirements are enforced as constraints, but they either cannot guarantee that the constraint will be satisfied by all possible predictions (even on unseen data) or they are limited in the type of constraints that can be enforced. In this paper, we present an approach to train neural networks which can enforce a wide variety of constraints and guarantee that the constraint is satisfied by all possible predictions. The approach builds on earlier work where learning linear models is formulated as a constraint satisfaction problem (CSP). To make this idea applicable to neural networks, two crucial new elements are added: constraint propagation over the network layers, and weight updates based on a mix of gradient descent and CSP solving. Evaluation on various machine learning tasks demonstrates that our approach is flexible enough to enforce a wide variety of domain constraints and is able to guarantee them in neural networks.
机器学习模型,特别是神经网络,正在变得越来越受欢迎,但也引起了对其可靠性的担忧,特别是在安全关键应用中,例如无人驾驶车辆必须确保安全行动。有一些方法可以训练在这些域中强制要求要求的神经网络,但这些要求通常被视为约束,并且它们要么无法保证所有可能的预测(甚至在未观测的数据上)都会满足约束,要么只能满足一些特定的约束类型。在本文中,我们提出了一种方法来训练可以强制各种约束并保证所有可能预测都满足约束的神经网络。该方法基于先前的工作,其中学习线性模型被构建为约束满足问题(CSP)。为了让这个想法适用于神经网络,我们添加了两个关键新的要素:网络层中的约束传递,以及基于梯度下降和CSP求解混合的更新权重。在各种机器学习任务上的评估表明,我们的方法非常灵活,可以强制各种域约束,并在神经网络中保证它们。
https://arxiv.org/abs/2303.01141