To defend deep neural networks from adversarial attacks, adversarial training has been drawing increasing attention for its effectiveness. However, the accuracy and robustness resulting from the adversarial training are limited by the architecture, because adversarial training improves accuracy and robustness by adjusting the weight connection affiliated to the architecture. In this work, we propose ARNAS to search for accurate and robust architectures for adversarial training. First we design an accurate and robust search space, in which the placement of the cells and the proportional relationship of the filter numbers are carefully determined. With the design, the architectures can obtain both accuracy and robustness by deploying accurate and robust structures to their sensitive positions, respectively. Then we propose a differentiable multi-objective search strategy, performing gradient descent towards directions that are beneficial for both natural loss and adversarial loss, thus the accuracy and robustness can be guaranteed at the same time. We conduct comprehensive experiments in terms of white-box attacks, black-box attacks, and transferability. Experimental results show that the searched architecture has the strongest robustness with the competitive accuracy, and breaks the traditional idea that NAS-based architectures cannot transfer well to complex tasks in robustness scenarios. By analyzing outstanding architectures searched, we also conclude that accurate and robust neural architectures tend to deploy different structures near the input and output, which has great practical significance on both hand-crafting and automatically designing of accurate and robust architectures.
为了防御对抗性攻击,对抗性训练已经越来越受到关注,因为它的有效性。然而,对抗性训练产生的准确性和稳健性是有局限性的,因为对抗性训练通过调整与架构相关的权重连接来提高准确性和稳健性。在本文中,我们提出了ARNAS来搜索对抗性训练的准确性和稳健架构。首先,我们设计了一个准确且鲁棒的设计空间,其中单元格的放置和滤波器数量的比例关系被仔细确定。通过这个设计,架构可以通过在其敏感位置部署准确且鲁棒的结构来获得准确性和稳健性。然后,我们提出了一个不同的多目标搜索策略,沿着对自然损失和对抗损失都有益的方向进行梯度下降,从而确保准确性和稳健性。我们在白盒攻击、黑盒攻击和可转移性方面进行了全面的实验。实验结果表明,所搜索的架构具有与竞争准确度相同的鲁棒性,并打破了传统观念,即基于NAS的架构在鲁棒性场景下的复杂任务转移效果不佳。通过分析所搜索的最优秀的架构,我们得出的结论是,准确且鲁棒的神经架构通常会在输入和输出附近部署不同的结构,这在手绘和自动设计准确且鲁棒的架构方面具有很大的实际意义。
https://arxiv.org/abs/2405.05502
Large pretrained language models (LLMs) have shown surprising In-Context Learning (ICL) ability. An important application in deploying large language models is to augment LLMs with a private database for some specific task. The main problem with this promising commercial use is that LLMs have been shown to memorize their training data and their prompt data are vulnerable to membership inference attacks (MIA) and prompt leaking attacks. In order to deal with this problem, we treat LLMs as untrusted in privacy and propose a locally differentially private framework of in-context learning(LDP-ICL) in the settings where labels are sensitive. Considering the mechanisms of in-context learning in Transformers by gradient descent, we provide an analysis of the trade-off between privacy and utility in such LDP-ICL for classification. Moreover, we apply LDP-ICL to the discrete distribution estimation problem. In the end, we perform several experiments to demonstrate our analysis results.
大预训练语言模型(LLMs)表现出令人惊讶的上下文学习(ICL)能力。在部署大型语言模型的重要应用之一是使用私有数据库为某些特定任务增强LLMs。然而,这种有益的商业应用的一个主要问题是LLMs已经被证明会记忆其训练数据,并且其提示数据易受成员推断攻击(MIA)和提示泄露攻击。为了处理这个问题,我们将LLMs视为不信任的隐私,并在敏感标签的设置中提出了一种局部差分隐私框架(LDP-ICL)框架。通过考虑在Transformer中上下文学习的机制,我们分析了在LDP-ICL中隐私与效用之间的权衡。此外,我们还将LDP-ICL应用于离散分布估计问题。最后,我们通过进行几次实验来验证我们的分析结果。
https://arxiv.org/abs/2405.04032
In this article, we consider designs of simple analog artificial neural networks based on adiabatic Josephson cells with a sigmoid activation function. A new approach based on the gradient descent method is developed to adjust the circuit parameters, allowing efficient signal transmission between the network layers. The proposed solution is demonstrated on the example of the system implementing XOR and OR logical operations.
在本文中,我们考虑了基于变温Josephson细胞的简单模拟人工神经网络的设计。我们开发了一种基于梯度下降方法的新方法来调整电路参数,实现网络层之间高效信号传输。所提出的解决方案在实现XOR和OR逻辑运算的系统上进行了演示。
https://arxiv.org/abs/2405.03521
Traditional mathematical programming solvers require long computational times to solve constrained minimization problems of complex and large-scale physical systems. Therefore, these problems are often transformed into unconstrained ones, and solved with computationally efficient optimization approaches based on first-order information, such as the gradient descent method. However, for unconstrained problems, balancing the minimization of the objective function with the reduction of constraint violations is challenging. We consider the class of time-dependent minimization problems with increasing (possibly) nonlinear and non-convex objective function and non-decreasing (possibly) nonlinear and non-convex inequality constraints. To efficiently solve them, we propose a penalty-based guardrail algorithm (PGA). This algorithm adapts a standard penalty-based method by dynamically updating the right-hand side of the constraints with a guardrail variable which adds a margin to prevent violations. We evaluate PGA on two novel application domains: a simplified model of a district heating system and an optimization model derived from learned deep neural networks. Our method significantly outperforms mathematical programming solvers and the standard penalty-based method, and achieves better performance and faster convergence than a state-of-the-art algorithm (IPDD) within a specified time limit.
传统数学编程求解器需要花费长的时间来求解复杂和大规模物理系统的约束最小化问题。因此,通常将这些问题转化为无约束问题,并使用计算效率高的优化方法(如梯度下降法)来求解。然而,对于无约束问题,平衡最小化目标函数与减少约束违反之间的平衡具有挑战性。我们考虑具有增加(可能)非线性和非凸目标函数以及减少(可能)非线性和非凸不等式约束的时间依赖最小化问题。为了有效地解决这些问题,我们提出了一个基于惩罚的守护算法(PGA)。该算法通过动态更新约束的右端随约束惩罚变量的增加,将标准的惩罚方法进行了适应。我们在两个新的应用领域上评估了PGA:一个简化的区域供暖系统的模型和一个学习深度神经网络得到的优化模型。我们的方法显著超过了数学编程求解器和标准惩罚方法,并且在指定的时间限制内实现了更好的性能和更快的收敛速度。
https://arxiv.org/abs/2405.01984
Antimicrobial peptides (AMPs) have exhibited unprecedented potential as biomaterials in combating multidrug-resistant bacteria. Despite the increasing adoption of artificial intelligence for novel AMP design, challenges pertaining to conflicting attributes such as activity, hemolysis, and toxicity have significantly impeded the progress of researchers. This paper introduces a paradigm shift by considering multiple attributes in AMP design. Presented herein is a novel approach termed Hypervolume-driven Multi-objective Antimicrobial Peptide Design (HMAMP), which prioritizes the simultaneous optimization of multiple attributes of AMPs. By synergizing reinforcement learning and a gradient descent algorithm rooted in the hypervolume maximization concept, HMAMP effectively expands exploration space and mitigates the issue of pattern collapse. This method generates a wide array of prospective AMP candidates that strike a balance among diverse attributes. Furthermore, we pinpoint knee points along the Pareto front of these candidate AMPs. Empirical results across five benchmark models substantiate that HMAMP-designed AMPs exhibit competitive performance and heightened diversity. A detailed analysis of the helical structures and molecular dynamics simulations for ten potential candidate AMPs validates the superiority of HMAMP in the realm of multi-objective AMP design. The ability of HMAMP to systematically craft AMPs considering multiple attributes marks a pioneering milestone, establishing a universal computational framework for the multi-objective design of AMPs.
抗生素肽(AMPs)作为生物材料在对抗多药耐药细菌方面展示了史无前例的潜力。尽管人工智能在新型AMP设计中的采用不断增加,但与活性、凝血和毒性等相互矛盾的属性相关的挑战会极大地阻碍研究进展。本文通过考虑多个属性来引入了一种范式变革,即多目标自适应抗微生物肽(HMAMP)设计。HMAMP通过协同强化学习和基于超体积最大化概念的梯度下降算法,实现了同时优化多个属性的目标。通过扩大探索空间并减轻模式崩溃的问题,HMAMP有效地产生了广泛的潜在AMP候选者。此外,我们指出了这些候选者的帕累托前沿上的关键点。在五个基准模型上的实证结果证实,HMAMP设计的AMPs具有竞争力和增强的多样性。对10个潜在AMP的螺旋结构和分子动力学模拟的详细分析证实了HMAMP在多目标AMP设计领域优越性的超群能力。HMAMP能够系统地构建AMPs考虑多个属性,标志着一个里程碑式的事件,建立了一个通用的计算框架,用于多目标AMP的设计。
https://arxiv.org/abs/2405.00753
Optimization techniques in deep learning are predominantly led by first-order gradient methodologies, such as SGD. However, neural network training can greatly benefit from the rapid convergence characteristics of second-order optimization. Newton's GD stands out in this category, by rescaling the gradient using the inverse Hessian. Nevertheless, one of its major bottlenecks is matrix inversion, which is notably time-consuming in $O(N^3)$ time with weak scalability. Matrix inversion can be translated into solving a series of linear equations. Given that quantum linear solver algorithms (QLSAs), leveraging the principles of quantum superposition and entanglement, can operate within a $\text{polylog}(N)$ time frame, they present a promising approach with exponential acceleration. Specifically, one of the most recent QLSAs demonstrates a complexity scaling of $O(d\cdot\kappa \log(N\cdot\kappa/\epsilon))$, depending on: {size~$N$, condition number~$\kappa$, error tolerance~$\epsilon$, quantum oracle sparsity~$d$} of the matrix. However, this also implies that their potential exponential advantage may be hindered by certain properties (i.e. $\kappa$ and $d$). We propose Q-Newton, a hybrid quantum-classical scheduler for accelerating neural network training with Newton's GD. Q-Newton utilizes a streamlined scheduling module that coordinates between quantum and classical linear solvers, by estimating & reducing $\kappa$ and constructing $d$ for the quantum solver. Our evaluation showcases the potential for Q-Newton to significantly reduce the total training time compared to commonly used optimizers like SGD. We hypothesize a future scenario where the gate time of quantum machines is reduced, possibly realized by attoseconds physics. Our evaluation establishes an ambitious and promising target for the evolution of quantum computing.
深度学习中的优化技术主要是由一阶梯度方法主导的,如SGD。然而,神经网络训练可以从二阶优化的大快收敛特性中大大受益。Newton的GD在这一类中表现突出,通过使用反Hessian对梯度进行缩放。然而,它的主要瓶颈是矩阵求逆,这在大规模的$O(N^3)$时间复杂度中尤为明显,且具有低可扩展性。矩阵求逆可以转化为求解一系列线性方程。鉴于量子线性求解算法(QLSAs)可以利用量子超衍和纠缠的原理,在多维情况下操作于$polylog(N)$时间框架内,它们呈现出了具有指数加速的前景。具体来说,最近的一个QLSA展示了大小$N$、条件数$\kappa$、误码率$\epsilon$和量子或acle稀疏度$d$的复杂度为$O(d\cdot\kappa \log(N\cdot\kappa/\epsilon))$。然而,这也意味着它们的优势可能受到某些属性(即$\kappa$和$d$)的阻碍。我们提出了Q-Newton,一种混合量子-经典调度器,用于加速神经网络训练,利用Newton的GD。Q-Newton采用一个经过优化的调度模块,协调量子和经典线性求解器,通过估计和减少$\kappa$并构建$d$来估计。我们的评估展示了Q-Newton相对于常用优化器(如SGD)显著减少总训练时间的前景。我们推测,将实现量子机器的门时间减半,可能是通过 attosecond 物理学中的一些技术实现的。我们的评估为量子计算的演变奠定了雄心勃勃且具有前景的目标。
https://arxiv.org/abs/2405.00252
Can we modify the training data distribution to encourage the underlying optimization method toward finding solutions with superior generalization performance on in-distribution data? In this work, we approach this question for the first time by comparing the inductive bias of gradient descent (GD) with that of sharpness-aware minimization (SAM). By studying a two-layer CNN, we prove that SAM learns easy and difficult features more uniformly, particularly in early epochs. That is, SAM is less susceptible to simplicity bias compared to GD. Based on this observation, we propose USEFUL, an algorithm that clusters examples based on the network output early in training and upsamples examples with no easy features to alleviate the pitfalls of the simplicity bias. We show empirically that modifying the training data distribution in this way effectively improves the generalization performance on the original data distribution when training with (S)GD by mimicking the training dynamics of SAM. Notably, we demonstrate that our method can be combined with SAM and existing data augmentation strategies to achieve, to the best of our knowledge, state-of-the-art performance for training ResNet18 on CIFAR10, STL10, CINIC10, Tiny-ImageNet; ResNet34 on CIFAR100; and VGG19 and DenseNet121 on CIFAR10.
我们能否修改训练数据分布,以促进底层优化方法在离散数据上找到具有卓越泛化性能的解决方案?在这项工作中,我们首次研究了梯度下降(GD)的归纳偏差与 sharpness-aware minimization(SAM)的归纳偏差之间的比较。通过研究两个层的全连接神经网络,我们证明SAM能够更加均匀地学习容易和困难的特征,特别是在训练早期。也就是说,与GD相比,SAM对简单性的倾向较小。基于这个观察结果,我们提出了USEFUL算法,该算法在训练过程中对网络输出进行聚类,并对没有容易特征的样本进行上采样,以减轻简单性的倾向。我们通过实验实证,证明了这种方法在用(S)GD训练时有效改善了原数据分布的泛化性能,通过模仿SAM的训练动态。值得注意的是,我们还证明了我们的方法可以与现有的数据增强策略相结合,以实现我们已知的最先进的训练ResNet18在CIFAR10、STL10、CINIC10和Tiny-ImageNet上的性能,ResNet34在CIFAR100上,以及VGG19和DenseNet121在CIFAR10上的性能。
https://arxiv.org/abs/2404.17768
In this paper, we focus on the problem of conformal prediction with conditional guarantees. Prior work has shown that it is impossible to construct nontrivial prediction sets with full conditional coverage guarantees. A wealth of research has considered relaxations of full conditional guarantees, relying on some predefined uncertainty structures. Departing from this line of thinking, we propose Partition Learning Conformal Prediction (PLCP), a framework to improve conditional validity of prediction sets through learning uncertainty-guided features from the calibration data. We implement PLCP efficiently with alternating gradient descent, utilizing off-the-shelf machine learning models. We further analyze PLCP theoretically and provide conditional guarantees for infinite and finite sample sizes. Finally, our experimental results over four real-world and synthetic datasets show the superior performance of PLCP compared to state-of-the-art methods in terms of coverage and length in both classification and regression scenarios.
在本文中,我们关注在有条件保证下进行对模型的对偶预测问题。先前的研究已经表明,无法构建具有完整条件覆盖保证的非平凡预测集。大量研究已经考虑了完整条件保证的放松,依赖于一些预定义的不确定性结构。我们离开了这种思考方式,提出了通过从校准数据中学习不确定性引导的特征来改善条件有效性的Partition Learning Conformal Prediction (PLCP)框架。我们使用交替梯度下降高效地实现PLCP,并利用通用的机器学习模型。我们进一步从理论上分析PLCP,并为有限和无限样本大小提供了条件保证。最后,我们通过分析四个真实世界和合成数据集的实验结果,证明了PLCP在分类和回归场景中的覆盖性和长度优于最先进的 methods。
https://arxiv.org/abs/2404.17487
This paper presents a novel stochastic barrier function (SBF) framework for safety analysis of stochastic systems based on piecewise (PW) functions. We first outline a general formulation of PW-SBFs. Then, we focus on PW-Constant (PWC) SBFs and show how their simplicity yields computational advantages for general stochastic systems. Specifically, we prove that synthesis of PWC-SBFs reduces to a minimax optimization problem. Then, we introduce three efficient algorithms to solve this problem, each offering distinct advantages and disadvantages. The first algorithm is based on dual linear programming (LP), which provides an exact solution to the minimax optimization problem. The second is a more scalable algorithm based on iterative counter-example guided synthesis, which involves solving two smaller LPs. The third algorithm solves the minimax problem using gradient descent, which admits even better scalability. We provide an extensive evaluation of these methods on various case studies, including neural network dynamic models, nonlinear switched systems, and high-dimensional linear systems. Our benchmarks demonstrate that PWC-SBFs outperform state-of-the-art methods, namely sum-of-squares and neural barrier functions, and can scale to eight dimensional systems.
本文提出了一种基于分段函数(PW)的安全分析新随机障碍函数(SBF)框架。我们首先概述了PW-SBF的一般公式。然后,我们重点讨论PW-Constant(PWC)SBFs,并证明了它们的简单性使得对于一般随机系统具有计算优势。具体来说,我们证明了PWC-SBF的合成减少到最小最大优化问题。接着,我们引入了三种有效的算法来解决这个问题,每种算法都有独特的优势和不足。第一个算法是基于双线性规划(LP),它提供了精确的最小最大优化问题解决方案。第二个算法是基于迭代反例指导合成,具有更大的可扩展性。第三个算法使用梯度下降来解决最小最大问题,具有更好的可扩展性。我们在各种案例研究中对这些方法进行了广泛的评估,包括神经网络动态模型、非线性切换系统和高度维线性系统。我们的基准测试表明,PWC-SBFs优于最先进的诸如平方和神经屏障函数的方法,并且可以扩展到八维系统。
https://arxiv.org/abs/2404.16986
Informative gradients are often lost in large batch updates. We propose a robust mechanism to reinforce the sparse components within a random batch of data points. A finite queue of online gradients is used to determine their expected instantaneous statistics. We propose a function to measure the scarcity of incoming gradients using these statistics and establish the theoretical ground of this mechanism. To minimize conflicting components within large mini-batches, samples are grouped with aligned objectives by clustering based on inherent feature space. Sparsity is measured for each centroid and weighted accordingly. A strong intuitive criterion to squeeze out redundant information from each cluster is the backbone of the system. It makes rare information indifferent to aggressive momentum also exhibits superior performance with larger mini-batch horizon. The effective length of the queue kept variable to follow the local loss pattern. The contribution of our method is to restore intra-mini-batch diversity at the same time widening the optimal batch boundary. Both of these collectively drive it deeper towards the minima. Our method has shown superior performance for CIFAR10, MNIST, and Reuters News category dataset compared to mini-batch gradient descent.
信息性梯度通常在大型批量更新中丢失。我们提出了一种稳健的方法来加强随机数据集中的稀疏组件。使用一个有限的消息队列来确定它们的预期瞬时统计量。我们提出了一个函数,用于通过这些统计量测量 incoming 梯度的稀疏性,并建立这个机制的理论基础。为了最小化大 mini-batch 中的冲突组件,通过聚类基于固有特征空间对样本进行分组。稀疏性为每个聚类的测量结果相应地加权。系统的一个强直觉标准来挤出冗余信息是该系统的骨架。它使得稀有信息对攻击性动量不敏感,在更大的 mini-batch 视野下表现出色。保持队列有效长度以跟踪局部损失模式。我们方法对 CIFAR10、MNIST 和 Reuters 新闻类别数据集的性能已经超越了 mini-batch 梯度下降。 Informative gradients are often lost in large batch updates. We propose a robust mechanism to reinforce the sparse components within a random batch of data points. A finite queue of online gradients is used to determine their expected instantaneous statistics. We propose a function to measure the scarcity of incoming gradients using these statistics and establish the theoretical ground of this mechanism. 为了最小化大 mini-batch 中的冲突组件,通过聚类基于固有特征空间对样本进行分组。稀疏性为每个聚类的测量结果相应地加权。系统的一个强直觉标准来挤出冗余信息是该系统的骨架。它使得稀有信息对攻击性动量不敏感,在更大的 mini-batch 视野下表现出色。保持队列有效长度以跟踪局部损失模式。 Our method has shown superior performance for CIFAR10, MNIST, and Reuters News category dataset compared to mini-batch gradient descent.
https://arxiv.org/abs/2404.16917
This paper presents a 6-DoF range-based Monte Carlo localization method with a GPU-accelerated Stein particle filter. To update a massive amount of particles, we propose a Gauss-Newton-based Stein variational gradient descent (SVGD) with iterative neighbor particle search. This method uses SVGD to collectively update particle states with gradient and neighborhood information, which provides efficient particle sampling. For an efficient neighbor particle search, it uses locality sensitive hashing and iteratively updates the neighbor list of each particle over time. The neighbor list is then used to propagate the posterior probabilities of particles over the neighbor particle graph. The proposed method is capable of evaluating one million particles in real-time on a single GPU and enables robust pose initialization and re-localization without an initial pose estimate. In experiments, the proposed method showed an extreme robustness to complete sensor occlusion (i.e., kidnapping), and enabled pinpoint sensor localization without any prior information.
本文提出了一种基于6个自由度的蒙特卡洛局部化方法,采用GPU加速的Stein粒子滤波器。为更新大量粒子,我们提出了一种基于Gauss-Newton的Stein变分梯度下降(SVGD)迭代邻居粒子搜索。该方法使用SVGD共同更新具有梯度和邻居信息的分子的状态,从而实现高效的粒子采样。为了实现高效的邻居粒子搜索,它使用了局部敏感哈希,并随着时间逐个更新每个粒子的邻居列表。邻居列表 then用于在邻居粒子图上传播粒子的后验概率。与传统方法相比,所提出的具有GPU加速的Stein粒子滤波器能够实时评估一百万个粒子,并无需初始姿态估计实现稳健的姿态初始化和重新定位。在实验中,该方法表现出了对完全传感器遮挡(即绑架)的极端鲁棒性,并能在没有任何先前信息的情况下实现精确的传感器局部定位。
https://arxiv.org/abs/2404.16370
Neural architecture search (NAS) is a challenging problem. Hierarchical search spaces allow for cheap evaluations of neural network sub modules to serve as surrogate for architecture evaluations. Yet, sometimes the hierarchy is too restrictive or the surrogate fails to generalize. We present FaDE which uses differentiable architecture search to obtain relative performance predictions on finite regions of a hierarchical NAS space. The relative nature of these ranks calls for a memory-less, batch-wise outer search algorithm for which we use an evolutionary algorithm with pseudo-gradient descent. FaDE is especially suited on deep hierarchical, respectively multi-cell search spaces, which it can explore by linear instead of exponential cost and therefore eliminates the need for a proxy search space. Our experiments show that firstly, FaDE-ranks on finite regions of the search space correlate with corresponding architecture performances and secondly, the ranks can empower a pseudo-gradient evolutionary search on the complete neural architecture search space.
神经架构搜索(NAS)是一个具有挑战性的问题。分层搜索空间允许对神经网络子模块进行廉价的评估,作为架构评估的代理。然而,有时候分层结构过于严格,或者代理无法泛化。我们提出了FaDE,它使用不同的iable架构搜索来获得分层 NAS 空间中有限区域的相对性能预测。这些相对排名的性质要求我们使用进化算法(我们使用具有伪梯度的进化算法)进行无记忆、批量的外搜索。FaDE 特别适用于具有深度分层和多细胞搜索空间的NAS,通过线性成本而不是指数成本进行探索,因此无需代理搜索空间。我们的实验结果表明,首先,FaDE在搜索空间有限区域上的排名与相应的架构性能相关联,其次,排名可以推动在完整神经架构搜索空间上的伪梯度进化搜索。
https://arxiv.org/abs/2404.16218
This paper introduces FlowMap, an end-to-end differentiable method that solves for precise camera poses, camera intrinsics, and per-frame dense depth of a video sequence. Our method performs per-video gradient-descent minimization of a simple least-squares objective that compares the optical flow induced by depth, intrinsics, and poses against correspondences obtained via off-the-shelf optical flow and point tracking. Alongside the use of point tracks to encourage long-term geometric consistency, we introduce differentiable re-parameterizations of depth, intrinsics, and pose that are amenable to first-order optimization. We empirically show that camera parameters and dense depth recovered by our method enable photo-realistic novel view synthesis on 360-degree trajectories using Gaussian Splatting. Our method not only far outperforms prior gradient-descent based bundle adjustment methods, but surprisingly performs on par with COLMAP, the state-of-the-art SfM method, on the downstream task of 360-degree novel view synthesis (even though our method is purely gradient-descent based, fully differentiable, and presents a complete departure from conventional SfM).
本文介绍了FlowMap,一种端到端的不同iable方法,用于求解视频序列中的精确相机姿态、相机内参和逐帧密集深度。我们的方法通过简单最小二乘目标函数对深度、内参和姿态引起的光学流进行逐视频梯度下降最小化。在点跟踪的使用下,我们引入了可进行一级优化的深度、内参和姿态的可导性重新参数化。我们通过实验验证,我们的方法能够使用高斯平铺实现照片现实感的360度轨迹合成。与基于梯度的 bundle adjustment 方法相比,我们的方法不仅远远超过了先前的结果,而且与最先进的SfM方法COLMAP在360度新视图合成下游任务的表现相当。尽管我们的方法是基于梯度的,完全不同导,完全与传统SfM不同,但它成功地克服了传统SfM的局限性。
https://arxiv.org/abs/2404.15259
This paper puts forth a new training data-untethered model poisoning (MP) attack on federated learning (FL). The new MP attack extends an adversarial variational graph autoencoder (VGAE) to create malicious local models based solely on the benign local models overheard without any access to the training data of FL. Such an advancement leads to the VGAE-MP attack that is not only efficacious but also remains elusive to detection. VGAE-MP attack extracts graph structural correlations among the benign local models and the training data features, adversarially regenerates the graph structure, and generates malicious local models using the adversarial graph structure and benign models' features. Moreover, a new attacking algorithm is presented to train the malicious local models using VGAE and sub-gradient descent, while enabling an optimal selection of the benign local models for training the VGAE. Experiments demonstrate a gradual drop in FL accuracy under the proposed VGAE-MP attack and the ineffectiveness of existing defense mechanisms in detecting the attack, posing a severe threat to FL.
本文提出了一种新的联邦学习(FL)数据无连接模欺骗(MP)攻击。新提出的MP攻击将对抗变分图形自动编码器(VGAE)扩展到仅根据未获得FL训练数据的恶意局部模型的创建。这种进步导致了一种VGAE-MP攻击,不仅有效,而且对攻击的检测仍然难以实现。VGAE-MP攻击提取了恶意局部模型和训练数据特征之间的图形结构相关性,以 adversarially 生成 graph 结构,并使用恶意图形结构和良性模型的特征生成恶意局部模型。此外,还提出了一种用VGAE和亚最小二乘法训练恶意局部模型的攻击算法,同时允许为训练VGAE选择最优的良性局部模型。实验证明,在提出的VGAE-MP攻击下,FL的准确性逐渐下降,而现有的防御机制在检测攻击方面无能为力,对FL构成了严重的威胁。
https://arxiv.org/abs/2404.15042
Understanding cognitive processes in the brain demands sophisticated models capable of replicating neural dynamics at large scales. We present a physiologically inspired speech recognition architecture, compatible and scalable with deep learning frameworks, and demonstrate that end-to-end gradient descent training leads to the emergence of neural oscillations in the central spiking neural network. Significant cross-frequency couplings, indicative of these oscillations, are measured within and across network layers during speech processing, whereas no such interactions are observed when handling background noise inputs. Furthermore, our findings highlight the crucial inhibitory role of feedback mechanisms, such as spike frequency adaptation and recurrent connections, in regulating and synchronising neural activity to improve recognition performance. Overall, on top of developing our understanding of synchronisation phenomena notably observed in the human auditory pathway, our architecture exhibits dynamic and efficient information processing, with relevance to neuromorphic technology.
理解大脑中的认知过程需要复杂且能够在大尺度上复制神经动态的模型。我们提出了一个生理学上启发的语音识别架构,与深度学习框架兼容并具有可扩展性,并证明了端到端梯度下降训练会导致中央尖峰神经网络中神经振荡的出现。在语音处理过程中,我们测量了跨频联系,这些联系表明了这些振荡,而在处理背景噪声输入时,并没有观察到这样的相互作用。此外,我们的研究结果突出了反馈机制(如尖峰频率适应和循环连接)在调节和同步神经活动以提高识别性能中的关键抑制作用。总的来说,在发展我们人类听觉通路中同步现象的基础上,我们的架构表现出动态和高效的信息处理,与类神经形态技术有关。
https://arxiv.org/abs/2404.14024
Large-scale deep neural networks (DNNs) have achieved remarkable success in many application scenarios. However, high computational complexity and energy costs of modern DNNs make their deployment on edge devices challenging. Model quantization is a common approach to deal with deployment constraints, but searching for optimized bit-widths can be challenging. In this work, we present Adaptive Bit-Width Quantization Aware Training (AdaQAT), a learning-based method that automatically optimizes weight and activation signal bit-widths during training for more efficient DNN inference. We use relaxed real-valued bit-widths that are updated using a gradient descent rule, but are otherwise discretized for all quantization operations. The result is a simple and flexible QAT approach for mixed-precision uniform quantization problems. Compared to other methods that are generally designed to be run on a pretrained network, AdaQAT works well in both training from scratch and fine-tuning scenarios.Initial results on the CIFAR-10 and ImageNet datasets using ResNet20 and ResNet18 models, respectively, indicate that our method is competitive with other state-of-the-art mixed-precision quantization approaches.
大规模的深度神经网络(DNNs)在许多应用场景中取得了显著的成功。然而,现代DNN的高计算复杂度和能源成本使得将它们部署到边缘设备上具有挑战性。模型量化是一种常见的应对部署限制的方法,但是寻找最优的比特宽度可能具有挑战性。在本文中,我们提出了自适应比特宽度量化感知训练(AdaQAT),一种基于学习的优化方法,用于在训练过程中自动优化权重和激活信号的比特宽度,以实现更高效的DNN推理。我们使用通过梯度下降规则更新的松弛实值比特宽度,但其他量化操作则全部离散化。结果是一种简单而灵活的QAT方法,用于混合精度统一量化问题。与通常为预训练网络设计的其他方法相比,AdaQAT在从零开始训练和微调场景上都表现良好。使用ResNet20和ResNet18模型的CIFAR-10和ImageNet数据集的初步结果表明,我们的方法与最先进的混合精度量化方法相当。
https://arxiv.org/abs/2404.16876
A primary function of back-propagation is to compute both the gradient of hidden representations and parameters for optimization with gradient descent. Training large models requires high computational costs due to their vast parameter sizes. While Parameter-Efficient Fine-Tuning (PEFT) methods aim to train smaller auxiliary models to save computational space, they still present computational overheads, especially in Fine-Tuning as a Service (FTaaS) for numerous users. We introduce Collaborative Adaptation (ColA) with Gradient Learning (GL), a parameter-free, model-agnostic fine-tuning approach that decouples the computation of the gradient of hidden representations and parameters. In comparison to PEFT methods, ColA facilitates more cost-effective FTaaS by offloading the computation of the gradient to low-cost devices. We also provide a theoretical analysis of ColA and experimentally demonstrate that ColA can perform on par or better than existing PEFT methods on various benchmarks.
反向传播的主要功能是计算隐藏表示的梯度和参数的梯度,以便使用梯度下降进行优化。训练大型模型需要由于其庞大的参数大小而产生高昂的计算成本。虽然参数高效的微调(PEFT)方法旨在通过训练小辅助模型来节省计算空间,但它们仍然存在计算开销,特别是在微调作为服务(FTaaS)中,对于大量用户来说尤为明显。我们引入了协作适应(ColA)与梯度学习(GL),一种无参数、模型无关的微调方法,它解耦了隐藏表示的梯度计算和参数计算。与PEFT方法相比,ColA通过将梯度计算的计算负担转移给低成本设备,从而实现更具有成本效益的FTaaS。我们还提供了关于ColA的理论和实验分析,并实验证明了ColA在各种基准测试中的表现与现有PEFT方法相当或者更好。
https://arxiv.org/abs/2404.13844
The Neural Tangent Kernel (NTK) has emerged as a fundamental concept in the study of wide Neural Networks. In particular, it is known that the positivity of the NTK is directly related to the memorization capacity of sufficiently wide networks, i.e., to the possibility of reaching zero loss in training, via gradient descent. Here we will improve on previous works and obtain a sharp result concerning the positivity of the NTK of feedforward networks of any depth. More precisely, we will show that, for any non-polynomial activation function, the NTK is strictly positive definite. Our results are based on a novel characterization of polynomial functions which is of independent interest.
神经元归一核(NTK)已成为研究广义神经网络的基本概念。特别是,已知NTK的正值与足够宽的网络的记忆能力直接相关,即在训练过程中达到零损失的可能性。在这里,我们将超越前人工作,得到关于任何深度的前馈网络NTK正值的尖锐结果。具体来说,我们将证明,对于任何非多项式激活函数,NTK都是严格正定实的。我们的结果基于一个关于多项式函数的新颖刻画,该函数具有独立的有意义性。
https://arxiv.org/abs/2404.12928
Conditional Generative Adversarial Networks (CGANs) exhibit significant potential in supervised learning model training by virtue of their ability to generate realistic labeled images. However, numerous studies have indicated the privacy leakage risk in CGANs models. The solution DPCGAN, incorporating the differential privacy framework, faces challenges such as heavy reliance on labeled data for model training and potential disruptions to original gradient information due to excessive gradient clipping, making it difficult to ensure model accuracy. To address these challenges, we present a privacy-preserving training framework called PATE-TripleGAN. This framework incorporates a classifier to pre-classify unlabeled data, establishing a three-party min-max game to reduce dependence on labeled data. Furthermore, we present a hybrid gradient desensitization algorithm based on the Private Aggregation of Teacher Ensembles (PATE) framework and Differential Private Stochastic Gradient Descent (DPSGD) method. This algorithm allows the model to retain gradient information more effectively while ensuring privacy protection, thereby enhancing the model's utility. Privacy analysis and extensive experiments affirm that the PATE-TripleGAN model can generate a higher quality labeled image dataset while ensuring the privacy of the training data.
条件生成对抗网络(CGANs)在监督学习模型训练方面的潜在优势在于其生成真实标注图像的能力。然而,许多研究表明,CGAN模型的隐私泄露风险。为解决这些挑战,我们提出了一个名为PATE-TripleGAN的隐私保护训练框架。该框架引入了一个分类器来预先分类未标注数据,建立了一个三方最小最大游戏以减少对标注数据的依赖。此外,我们还提出了一个基于Private Aggregation of Teacher Ensembles(PATE)框架和Differential Private Stochastic Gradient Descent(DPSGD)方法的混合梯度缓解算法。该算法可以在保护隐私的同时确保模型具有更好的利用价值。隐私分析和广泛的实验结果证实,在保护训练数据隐私的情况下,PATE-TripleGAN模型可以生成更高质量的标注图像数据。
https://arxiv.org/abs/2404.12730
This paper focuses on reducing the communication cost of federated learning by exploring generalization bounds and representation learning. We first characterize a tighter generalization bound for one-round federated learning based on local clients' generalizations and heterogeneity of data distribution (non-iid scenario). We also characterize a generalization bound in R-round federated learning and its relation to the number of local updates (local stochastic gradient descents (SGDs)). Then, based on our generalization bound analysis and our representation learning interpretation of this analysis, we show for the first time that less frequent aggregations, hence more local updates, for the representation extractor (usually corresponds to initial layers) leads to the creation of more generalizable models, particularly for non-iid scenarios. We design a novel Federated Learning with Adaptive Local Steps (FedALS) algorithm based on our generalization bound and representation learning analysis. FedALS employs varying aggregation frequencies for different parts of the model, so reduces the communication cost. The paper is followed with experimental results showing the effectiveness of FedALS.
本文重点探讨了通过探索泛化界和表示学习来降低联邦学习中的通信成本。首先,我们基于局部客户端的泛化能力和数据分布异质性(非iid场景)定义了一个更紧的泛化界。然后,我们在R轮联邦学习和其与本地更新数量的关系上进行了定义。基于我们对泛化界分析的推理和表示学习的解释,我们证明了表示提取器(通常对应于初始层)进行更少的聚合会导致创建更具有泛化能力的模型,尤其是在非iid场景中。我们基于泛化界和表示学习分析设计了一种名为FedALS的新联邦学习算法。FedALS采用不同的聚合频率来对模型的不同部分进行动态调整,从而降低了通信成本。本文附有实验结果,展示了FedALS的有效性。
https://arxiv.org/abs/2404.11754