This paper presents a reactive navigation method that leverages a Model Predictive Path Integral (MPPI) control enhanced with spline interpolation for the control input sequence and Stein Variational Gradient Descent (SVGD). The MPPI framework addresses a nonlinear optimization problem by determining an optimal sequence of control inputs through a sampling-based approach. The efficacy of MPPI is significantly influenced by the sampling noise. To rapidly identify routes that circumvent large and/or newly detected obstacles, it is essential to employ high levels of sampling noise. However, such high noise levels result in jerky control input sequences, leading to non-smooth trajectories. To mitigate this issue, we propose the integration of spline interpolation within the MPPI process, enabling the generation of smooth control input sequences despite the utilization of substantial sampling noises. Nonetheless, the standard MPPI algorithm struggles in scenarios featuring multiple optimal or near-optimal solutions, such as environments with several viable obstacle avoidance paths, due to its assumption that the distribution over an optimal control input sequence can be closely approximated by a Gaussian distribution. To address this limitation, we extend our method by incorporating SVGD into the MPPI framework with spline interpolation. SVGD, rooted in the optimal transportation algorithm, possesses the unique ability to cluster samples around an optimal solution. Consequently, our approach facilitates robust reactive navigation by swiftly identifying obstacle avoidance paths while maintaining the smoothness of the control input sequences. The efficacy of our proposed method is validated on simulations with a quadrotor, demonstrating superior performance over existing baseline techniques.
本文提出了一种反应式导航方法,该方法利用Model预测路径积分(MPPI)控制与平滑插值在控制输入序列和Stein变分梯度下降(SVGD)的增强。MPPI框架通过基于采样的方法确定最优的控制输入序列,从而解决非线性优化问题。MPPI的有效性在很大程度上受到采样噪声的影响。为了迅速识别绕过大型和/或新近发现的障碍物的路线,必须采用高水平的采样噪声。然而,这种高噪声水平会导致平滑控制输入序列,从而导致非平稳轨迹。为了减轻这个问题,我们将在MPPI过程中集成平滑插值,使得尽管使用了大量的采样噪声,仍然可以生成平滑的控制输入序列。然而,标准的MPPI算法在具有多个最优或近最优解决方案的环境中表现不佳,因为其假定最优控制输入序列的分布可以近似为高斯分布。为了解决这个问题,我们通过将SVGD集成到MPPI框架中并使用平滑插值来扩展我们的方法。SVGD,源于最优运输算法,具有将样本聚类在最优解周围的独特能力。因此,我们的方法通过迅速识别避障路径并保持控制输入序列的平滑性,促进了鲁棒的反应式导航。我们在四旋翼仿真中验证了所提出方法的有效性,表明其性能优于现有基线技术。
https://arxiv.org/abs/2404.10395
This paper presents a novel approach to optimizing profit margins in non-life insurance markets through a gradient descent-based method, targeting three key objectives: 1) maximizing profit margins, 2) ensuring conversion rates, and 3) enforcing fairness criteria such as demographic parity (DP). Traditional pricing optimization, which heavily lean on linear and semi definite programming, encounter challenges in balancing profitability and fairness. These challenges become especially pronounced in situations that necessitate continuous rate adjustments and the incorporation of fairness criteria. Specifically, indirect Ratebook optimization, a widely-used method for new business price setting, relies on predictor models such as XGBoost or GLMs/GAMs to estimate on downstream individually optimized prices. However, this strategy is prone to sequential errors and struggles to effectively manage optimizations for continuous rate scenarios. In practice, to save time actuaries frequently opt for optimization within discrete intervals (e.g., range of [-20\%, +20\%] with fix increments) leading to approximate estimations. Moreover, to circumvent infeasible solutions they often use relaxed constraints leading to suboptimal pricing strategies. The reverse-engineered nature of traditional models complicates the enforcement of fairness and can lead to biased outcomes. Our method addresses these challenges by employing a direct optimization strategy in the continuous space of rates and by embedding fairness through an adversarial predictor model. This innovation not only reduces sequential errors and simplifies the complexities found in traditional models but also directly integrates fairness measures into the commercial premium calculation. We demonstrate improved margin performance and stronger enforcement of fairness highlighting the critical need to evolve existing pricing strategies.
本文提出了一种通过梯度下降方法优化非寿险市场利润率的新方法,旨在实现三个关键目标:1)最大化利润率,2)确保转换率,3)强制执行公平标准,如人口平等(DP)。传统的定价优化方法过于依赖线性和半定理规划,在平衡盈利和公平方面存在挑战。特别是在需要连续调整速率和公平标准的情况下,这些挑战变得更加突出。具体来说,间接率簿优化,一种广泛用于新业务价格设置的方法,依赖于预测模型如XGBoost或GLMs/GAMs来估计下游的个体优化价格。然而,这种策略容易产生序列误差,且在处理连续速率场景的优化时表现不佳。在实践中,为了节省时间, Actuaries经常在离散区间内进行优化(例如,范围为[-20%, +20%],固定步长),导致近似估计。此外,为了绕过无解问题,他们通常使用放松的约束条件,导致不公平定价策略。传统模型的反向工程性质使得公平和执行变得更加复杂,可能导致偏差结果。我们的方法通过在连续利率的领域采用直接优化策略来解决这些挑战。通过将公平通过对抗性预测器模型嵌入其中,这种创新不仅减少了序列误差,简化了传统模型的复杂性,而且将公平度措施直接整合到商业保单计算中。我们证明了提高边际表现的改进效果,并强调了必须改革现有定价策略的关键必要性。
https://arxiv.org/abs/2404.10275
While 3D Gaussian Splatting has recently become popular for neural rendering, current methods rely on carefully engineered cloning and splitting strategies for placing Gaussians, which does not always generalize and may lead to poor-quality renderings. In addition, for real-world scenes, they rely on a good initial point cloud to perform well. In this work, we rethink 3D Gaussians as random samples drawn from an underlying probability distribution describing the physical representation of the scene -- in other words, Markov Chain Monte Carlo (MCMC) samples. Under this view, we show that the 3D Gaussian updates are strikingly similar to a Stochastic Langevin Gradient Descent (SGLD) update. As with MCMC, samples are nothing but past visit locations, adding new Gaussians under our framework can simply be realized without heuristics as placing Gaussians at existing Gaussian locations. To encourage using fewer Gaussians for efficiency, we introduce an L1-regularizer on the Gaussians. On various standard evaluation scenes, we show that our method provides improved rendering quality, easy control over the number of Gaussians, and robustness to initialization.
虽然最近3D高斯平铺在神经渲染中变得流行,但现有的方法依赖于仔细设计的克隆和分割策略来放置高斯分布,这并不总是通用,并可能导致渲染质量差。此外,对于现实世界的场景,它们依赖于一个良好的初始点云来表现出色。在这项工作中,我们将3D高斯视为来自描述场景物理表示的概率分布的随机样本——换句话说,随机过程蒙特卡洛(MCMC)样本。在这种观点下,我们证明了3D高斯更新与随机Langevin梯度下降(SGLD)更新非常相似。与MCMC一样,样本只是过去的访问位置,在我们的框架中添加新高斯分布只需要简单的策略,即在现有高斯位置放置新高斯。为了鼓励使用更少的Gaussians,我们在Gaussians上引入了L1正则化。在各种标准评估场景中,我们证明了我们的方法提供了改进的渲染质量,容易控制高斯数量,以及对初始化的鲁棒性。
https://arxiv.org/abs/2404.09591
The worldwide adoption of machine learning (ML) and deep learning models, particularly in critical sectors, such as healthcare and finance, presents substantial challenges in maintaining individual privacy and fairness. These two elements are vital to a trustworthy environment for learning systems. While numerous studies have concentrated on protecting individual privacy through differential privacy (DP) mechanisms, emerging research indicates that differential privacy in machine learning models can unequally impact separate demographic subgroups regarding prediction accuracy. This leads to a fairness concern, and manifests as biased performance. Although the prevailing view is that enhancing privacy intensifies fairness disparities, a smaller, yet significant, subset of research suggests the opposite view. In this article, with extensive evaluation results, we demonstrate that the impact of differential privacy on fairness is not monotonous. Instead, we observe that the accuracy disparity initially grows as more DP noise (enhanced privacy) is added to the ML process, but subsequently diminishes at higher privacy levels with even more noise. Moreover, implementing gradient clipping in the differentially private stochastic gradient descent ML method can mitigate the negative impact of DP noise on fairness. This mitigation is achieved by moderating the disparity growth through a lower clipping threshold.
全球范围内机器学习(ML)和深度学习模型的采用,特别是在关键行业,如医疗保健和金融,对维护个人隐私和公平性提出了实质性的挑战。这两个要素对于一个可信赖的学习环境至关重要。虽然许多研究通过差异隐私(DP)机制保护个人隐私,但新兴研究表示,在机器学习模型中,差异隐私可能平等地影响预测准确性。这导致公平性问题,并表现为偏见表现。尽管普遍的看法是,提高隐私会加剧不公平差异,但较小、但具有重大意义的研究表明,这种看法是错误的。在本文中,我们通过大量评估结果,证明差异隐私对公平性的影响不是单调的。相反,我们观察到,在向ML过程添加更多DP噪声(增强隐私)时,准确度差异 initially增长,但随后在更高隐私水平上减少至更多噪声。此外,在差异隐私随机梯度下降ML方法中实现梯度截断可以减轻DP噪声对公平性的负面影响。这种减轻是通过降低截断阈值来调节差异增长实现的。
https://arxiv.org/abs/2404.09391
This paper presents the first algorithm for model-based offline quantum reinforcement learning and demonstrates its functionality on the cart-pole benchmark. The model and the policy to be optimized are each implemented as variational quantum circuits. The model is trained by gradient descent to fit a pre-recorded data set. The policy is optimized with a gradient-free optimization scheme using the return estimate given by the model as the fitness function. This model-based approach allows, in principle, full realization on a quantum computer during the optimization phase and gives hope that a quantum advantage can be achieved as soon as sufficiently powerful quantum computers are available.
本文提出了基于模型的离线量子强化学习的第一种算法,并在 cart-pole 基准上证明了其功能。模型和要优化的策略都被实现为变分量子电路。通过使用模型给出的返回估计作为 fitness 函数,策略通过梯度无关优化方案进行优化。基于模型的方法在优化阶段原则上允许实现完整的量子计算机,并为尽快拥有足够强大的量子计算机带来了希望。
https://arxiv.org/abs/2404.10017
Neural Cellular Automata (NCA) is a class of Cellular Automata where the update rule is parameterized by a neural network that can be trained using gradient descent. In this paper, we focus on NCA models used for texture synthesis, where the update rule is inspired by partial differential equations (PDEs) describing reaction-diffusion systems. To train the NCA model, the spatio-termporal domain is discretized, and Euler integration is used to numerically simulate the PDE. However, whether a trained NCA truly learns the continuous dynamic described by the corresponding PDE or merely overfits the discretization used in training remains an open question. We study NCA models at the limit where space-time discretization approaches continuity. We find that existing NCA models tend to overfit the training discretization, especially in the proximity of the initial condition, also called "seed". To address this, we propose a solution that utilizes uniform noise as the initial condition. We demonstrate the effectiveness of our approach in preserving the consistency of NCA dynamics across a wide range of spatio-temporal granularities. Our improved NCA model enables two new test-time interactions by allowing continuous control over the speed of pattern formation and the scale of the synthesized patterns. We demonstrate this new NCA feature in our interactive online demo. Our work reveals that NCA models can learn continuous dynamics and opens new venues for NCA research from a dynamical systems' perspective.
神经元细胞自动机(NCA)是一种细胞自动机,其中更新规则通过一个可以利用梯度下降进行训练的神经网络进行参数化。在本文中,我们重点关注用于纹理合成的高NCA模型,其中更新规则受到描述反应扩散系统的部分微分方程(PDE)的启发。为了训练NCA模型,将空间时间离散化,并用欧拉积分进行数值求解PDE。然而,训练后的NCA是否真正学会了由相应PDE描述的连续动态,还是仅仅在训练中过度拟合使用的离散化,仍然是一个未解决的问题。我们研究了在空间时间离散化逼近连续的情况下NCA模型的极限。我们发现,现有的NCA模型往往在训练附近过度拟合训练离散化,尤其是在初始条件附近,也称为“种子”处。为了解决这个问题,我们提出了一个使用均匀噪声作为初始条件的解决方案。我们证明了我们的方法在保持NCA动态的一致性方面具有有效性。通过允许对图案形成速度和合成图案的大小进行连续控制,我们的改进NCA模型为NCA研究提供了新的途径。我们在交互式在线演示中展示了这一新的NCA特性。我们的研究揭示了NCA模型可以学习连续动态,并为NCA研究从动态系统的角度打开新的研究方向。
https://arxiv.org/abs/2404.06279
One of the objectives of continual learning is to prevent catastrophic forgetting in learning multiple tasks sequentially, and the existing solutions have been driven by the conceptualization of the plasticity-stability dilemma. However, the convergence of continual learning for each sequential task is less studied so far. In this paper, we provide a convergence analysis of memory-based continual learning with stochastic gradient descent and empirical evidence that training current tasks causes the cumulative degradation of previous tasks. We propose an adaptive method for nonconvex continual learning (NCCL), which adjusts step sizes of both previous and current tasks with the gradients. The proposed method can achieve the same convergence rate as the SGD method when the catastrophic forgetting term which we define in the paper is suppressed at each iteration. Further, we demonstrate that the proposed algorithm improves the performance of continual learning over existing methods for several image classification tasks.
持续学习的一个目标是防止在按顺序学习多个任务时出现灾难性遗忘,现有解决方案的动力源于塑性-稳定困境的概念。然而,目前对每个任务上连续学习收敛的研究还比较少。在本文中,我们提供了基于记忆的连续学习与随机梯度下降的收敛分析,并给出了实证证据,即训练当前任务会使得以前任务的累积退化。我们提出了一个自适应的连续学习(NCCL)方法,该方法根据梯度调整前一个和当前任务的步长。当我们在每个迭代中抑制我们定义在论文中的灾难性遗忘项时,与SGD方法相同的收敛率。此外,我们还证明了所提出的算法在多个图像分类任务上的性能优于现有方法。
https://arxiv.org/abs/2404.05555
We propose a novel gradient-based online optimization framework for solving stochastic programming problems that frequently arise in the context of cyber-physical and robotic systems. Our problem formulation accommodates constraints that model the evolution of a cyber-physical system, which has, in general, a continuous state and action space, is nonlinear, and where the state is only partially observed. We also incorporate an approximate model of the dynamics as prior knowledge into the learning process and show that even rough estimates of the dynamics can significantly improve the convergence of our algorithms. Our online optimization framework encompasses both gradient descent and quasi-Newton methods, and we provide a unified convergence analysis of our algorithms in a non-convex setting. We also characterize the impact of modeling errors in the system dynamics on the convergence rate of the algorithms. Finally, we evaluate our algorithms in simulations of a flexible beam, a four-legged walking robot, and in real-world experiments with a ping-pong playing robot.
我们提出了一个新颖的基于梯度的在线优化框架,用于解决在计算机物理和机器人系统中经常出现的随机规划问题。我们的问题建模容纳了描述计算机物理系统演变的一般连续状态和动作空间、非线性的状态,并且仅部分观察到的状态。我们还将近似模型动态作为先验知识纳入学习过程,并证明了即使对动态的粗略估计,也可以显著提高算法的收敛。我们的在线优化框架包括梯度下降和准Newton方法,并在非凸设置中提供对算法收敛分析的统一。我们还研究了系统动态建模误差对算法收敛率的影响。最后,我们在柔性梁、四足行走机器人和实世界乒乓球机器人上进行了算法的仿真评估。
https://arxiv.org/abs/2404.05318
Single-pixel imaging (SPI) is a potential computational imaging technique which produces image by solving an illposed reconstruction problem from few measurements captured by a single-pixel detector. Deep learning has achieved impressive success on SPI reconstruction. However, previous poor reconstruction performance and impractical imaging model limit its real-world applications. In this paper, we propose a deep unfolding network with hybrid-attention Transformer on Kronecker SPI model, dubbed HATNet, to improve the imaging quality of real SPI cameras. Specifically, we unfold the computation graph of the iterative shrinkagethresholding algorithm (ISTA) into two alternative modules: efficient tensor gradient descent and hybrid-attention multiscale denoising. By virtue of Kronecker SPI, the gradient descent module can avoid high computational overheads rooted in previous gradient descent modules based on vectorized SPI. The denoising module is an encoder-decoder architecture powered by dual-scale spatial attention for high- and low-frequency aggregation and channel attention for global information recalibration. Moreover, we build a SPI prototype to verify the effectiveness of the proposed method. Extensive experiments on synthetic and real data demonstrate that our method achieves the state-of-the-art performance. The source code and pre-trained models are available at this https URL.
单像素成像(SPI)是一种潜在的计算成像技术,通过从单个像素检测器捕获的几个测量来解决欠拟合重建问题。在SPI重建方面,深度学习取得了令人印象深刻的成功。然而,先前的差重建性能和不可行的成像模型限制了其在现实应用中的实用性。在本文中,我们提出了一个基于Kronecker SPI模型的深度展开网络,称之为HATNet,以提高真实SPI相机的图像质量。具体来说,我们将迭代收缩阈值算法(ISTA)的计算图展开为两个可替代模块:高效的张量梯度下降和混合注意力的多尺度去噪。得益于Kronecker SPI,梯度下降模块可以避免基于之前基于向量化SPI的梯度下降模块的高计算开销。去噪模块是一个基于双尺度空间注意力的编码器-解码器架构,用于高和低频聚合和全局信息重新校正。此外,我们还构建了一个SPI原型,以验证所提出方法的有效性。对合成和真实数据的实验表明,我们的方法实现了最先进的性能。源代码和预训练模型可在此处下载:https://url.cn/spi-prototype
https://arxiv.org/abs/2404.05001
There currently exist two extreme viewpoints for neural network feature learning -- (i) Neural networks simply implement a kernel method (a la NTK) and hence no features are learned (ii) Neural networks can represent (and hence learn) intricate hierarchical features suitable for the data. We argue in this paper neither interpretation is likely to be correct based on a novel viewpoint. Neural networks can be viewed as a mixture of experts, where each expert corresponds to a (number of layers length) path through a sequence of hidden units. We use this alternate interpretation to motivate a model, called the Deep Linearly Gated Network (DLGN), which sits midway between deep linear networks and ReLU networks. Unlike deep linear networks, the DLGN is capable of learning non-linear features (which are then linearly combined), and unlike ReLU networks these features are ultimately simple -- each feature is effectively an indicator function for a region compactly described as an intersection of (number of layers) half-spaces in the input space. This viewpoint allows for a comprehensive global visualization of features, unlike the local visualizations for neurons based on saliency/activation/gradient maps. Feature learning in DLGNs is shown to happen and the mechanism with which this happens is through learning half-spaces in the input space that contain smooth regions of the target function. Due to the structure of DLGNs, the neurons in later layers are fundamentally the same as those in earlier layers -- they all represent a half-space -- however, the dynamics of gradient descent impart a distinct clustering to the later layer neurons. We hypothesize that ReLU networks also have similar feature learning behaviour.
目前存在两种极端观点用于神经网络特征学习:(i)神经网络简单地实现核方法(类似于NTK),因此没有特征被学习;(ii)神经网络可以表示(因此可以学习)适合数据的有层次特征。在本文中,我们认为基于一种新颖的观点,这两种解释都不太可能正确。神经网络可以看作是一个专家的混合,每个专家对应于一个(层数长度)通过隐藏单元的序列路径。我们使用这种替代解释来激励一个模型,称为深度线性有门网络(DLGN),该模型处于深度线性网络和ReLU网络之间。与深度线性网络不同,DLGN能够学习非线性特征(然后将这些特征进行线性组合),与ReLU网络不同,这些特征最终是简单的——每个特征实际上是输入空间中(层数)半空间的指示函数。这种观点允许对特征进行全面的全局可视化,而不仅仅是基于局部可视化对神经元的可视化。DLGN中的特征学习已经被证明是存在的,而且是通过在输入空间中学习包含目标函数平滑区域的半空间来实现的。由于DLGN的结构,后层的神经元与前层的神经元本质上相同——它们都代表一个半空间——然而,梯度下降的动态使后层神经元的聚类特征更加明显。我们假设ReLU网络也具有类似特征学习行为。
https://arxiv.org/abs/2404.04312
Localization and mapping are critical tasks for various applications such as autonomous vehicles and robotics. The challenges posed by outdoor environments present particular complexities due to their unbounded characteristics. In this work, we present MM-Gaussian, a LiDAR-camera multi-modal fusion system for localization and mapping in unbounded scenes. Our approach is inspired by the recently developed 3D Gaussians, which demonstrate remarkable capabilities in achieving high rendering quality and fast rendering speed. Specifically, our system fully utilizes the geometric structure information provided by solid-state LiDAR to address the problem of inaccurate depth encountered when relying solely on visual solutions in unbounded, outdoor scenarios. Additionally, we utilize 3D Gaussian point clouds, with the assistance of pixel-level gradient descent, to fully exploit the color information in photos, thereby achieving realistic rendering effects. To further bolster the robustness of our system, we designed a relocalization module, which assists in returning to the correct trajectory in the event of a localization failure. Experiments conducted in multiple scenarios demonstrate the effectiveness of our method.
定位和映射是各种应用(如自动驾驶和机器人)的关键任务。户外环境所带来的挑战由于其无限制的特点而显得尤为复杂。在这项工作中,我们提出了MM-Gaussian,一种用于在无限制场景中进行定位和映射的LiDAR相机多模态融合系统。我们的方法受到最近开发的3D Gaussians的启发,这些方法在实现高渲染质量和快速渲染速度方面表现出色。具体来说,我们的系统充分利用了固态LiDAR提供的几何结构信息来解决在无限制、户外场景中仅依赖视觉解决方案时遇到的准确度问题。此外,我们还利用3D Gaussian点云,通过级联的像素级梯度下降,充分利用照片中的颜色信息,从而实现真实渲染效果。为了进一步加强系统的稳健性,我们设计了一个重新定位模块,在定位失败时协助返回正确的轨迹。在多个场景进行的实验证明了我们的方法的有效性。
https://arxiv.org/abs/2404.04026
In a recent, strongly emergent literature on few-shot CLIP adaptation, Linear Probe (LP) has been often reported as a weak baseline. This has motivated intensive research building convoluted prompt learning or feature adaptation strategies. In this work, we propose and examine from convex-optimization perspectives a generalization of the standard LP baseline, in which the linear classifier weights are learnable functions of the text embedding, with class-wise multipliers blending image and text knowledge. As our objective function depends on two types of variables, i.e., the class visual prototypes and the learnable blending parameters, we propose a computationally efficient block coordinate Majorize-Minimize (MM) descent algorithm. In our full-batch MM optimizer, which we coin LP++, step sizes are implicit, unlike standard gradient descent practices where learning rates are intensively searched over validation sets. By examining the mathematical properties of our loss (e.g., Lipschitz gradient continuity), we build majorizing functions yielding data-driven learning rates and derive approximations of the loss's minima, which provide data-informed initialization of the variables. Our image-language objective function, along with these non-trivial optimization insights and ingredients, yields, surprisingly, highly competitive few-shot CLIP performances. Furthermore, LP++ operates in black-box, relaxes intensive validation searches for the optimization hyper-parameters, and runs orders-of-magnitudes faster than state-of-the-art few-shot CLIP adaptation methods. Our code is available at: \url{this https URL}.
在最近一篇关于少样本CLIP适应性的强烈涌现文献中,线性探测器(LP)通常被报道为弱基线。这激励了广泛的研究,基于卷积神经网络(CNN)的复杂提示学习或特征适应策略。在这项工作中,我们提出并探讨从凸优化角度来扩展标准LP基线,其中线性分类器权重是文本嵌入的可学习函数,带有一致性类别的乘积,融合图像和文本知识。由于我们的目标函数依赖于两种变量,即类视觉原型和学习可调整的融合参数,我们提出了一个计算效率的块级坐标最大-最小(MM)下降算法。在我們的完整批量的MM优化器中,步长是隐含的,而标准梯度下降方法在验证集上进行学习率的大幅搜索。通过研究我们损失的数学性质(例如Lipschitz梯度连续性),我们构建了产生数据驱动学习率的主要函数,并导出了损失最小值的近似,为变量的数据指导初始化。我们的人脸语言目标函数,以及这些非平凡优化见解和成分,出人意料地产生了高度竞争性的少样本CLIP性能。此外,LP++在黑盒模式下运行,放松了优化超参数的密集搜索,并且运行速度比最先进的少样本CLIP适应性方法快得多。我们的代码可在此处下载:\url{this https URL}.
https://arxiv.org/abs/2404.02285
In this paper, we put forward a neural network framework to solve the nonlinear hyperbolic systems. This framework, named relaxation neural networks(RelaxNN), is a simple and scalable extension of physics-informed neural networks(PINN). It is shown later that a typical PINN framework struggles to handle shock waves that arise in hyperbolic systems' solutions. This ultimately results in the failure of optimization that is based on gradient descent in the training process. Relaxation systems provide a smooth asymptotic to the discontinuity solution, under the expectation that macroscopic problems can be solved from a microscopic perspective. Based on relaxation systems, the RelaxNN framework alleviates the conflict of losses in the training process of the PINN framework. In addition to the remarkable results demonstrated in numerical simulations, most of the acceleration techniques and improvement strategies aimed at the standard PINN framework can also be applied to the RelaxNN framework.
在本文中,我们提出了一个神经网络框架来解决非线性超几何系统。这个框架名为放松神经网络(RelaxNN),是物理正交神经网络(PINN)的简单且可扩展版本。后来发现,典型的PINN框架很难处理超几何系统解中产生的应激波。这最终导致基于梯度的训练过程中优化失败。放松系统提供了一个平滑的渐进到离散解,只要期望从微观角度解决问题,宏观问题就可以解决。基于放松系统,RelaxNN框架减轻了PINN框架在训练过程中损失的冲突。除了在数值仿真中展示的惊人的结果外,大多数加速技术和改进策略都可以应用到RelaxNN框架中。
https://arxiv.org/abs/2404.01163
Deep representation learning methods struggle with continual learning, suffering from both catastrophic forgetting of useful units and loss of plasticity, often due to rigid and unuseful units. While many methods address these two issues separately, only a few currently deal with both simultaneously. In this paper, we introduce Utility-based Perturbed Gradient Descent (UPGD) as a novel approach for the continual learning of representations. UPGD combines gradient updates with perturbations, where it applies smaller modifications to more useful units, protecting them from forgetting, and larger modifications to less useful units, rejuvenating their plasticity. We use a challenging streaming learning setup where continual learning problems have hundreds of non-stationarities and unknown task boundaries. We show that many existing methods suffer from at least one of the issues, predominantly manifested by their decreasing accuracy over tasks. On the other hand, UPGD continues to improve performance and surpasses or is competitive with all methods in all problems. Finally, in extended reinforcement learning experiments with PPO, we show that while Adam exhibits a performance drop after initial learning, UPGD avoids it by addressing both continual learning issues.
深度表示学习方法在持续学习方面存在困难,常常由于刚性和无用的单元而受到损失。虽然许多方法分别解决了这两个问题,但只有少数方法同时处理这两个问题。在本文中,我们引入了一种名为 Utility-based Perturbed Gradient Descent (UPGD) 的新的方法,作为用于连续学习表示的新颖方法。UPGD 将梯度更新与扰动相结合,对更有用的单元应用较小的修改,以保护它们不遗忘,对更无用的单元应用较大的修改,以恢复它们的塑料性。我们使用具有挑战性的流式学习设置,其中连续学习问题具有数百个非平稳性和未知的任务边界。我们证明了大多数现有方法至少存在一个这些问题,主要表现在它们在任务上的准确性下降。另一方面,UPGD 在所有问题上都持续改进并超过了或与所有方法竞争。最后,在扩展的强化学习实验中使用PPO,我们证明了,尽管Adam在初始学习后表现下降,但UPGD通过解决连续学习和无用性问题来避免这种情况。
https://arxiv.org/abs/2404.00781
Cluster analysis plays a crucial role in database mining, and one of the most widely used algorithms in this field is DBSCAN. However, DBSCAN has several limitations, such as difficulty in handling high-dimensional large-scale data, sensitivity to input parameters, and lack of robustness in producing clustering results. This paper introduces an improved version of DBSCAN that leverages the block-diagonal property of the similarity graph to guide the clustering procedure of DBSCAN. The key idea is to construct a graph that measures the similarity between high-dimensional large-scale data points and has the potential to be transformed into a block-diagonal form through an unknown permutation, followed by a cluster-ordering procedure to generate the desired permutation. The clustering structure can be easily determined by identifying the diagonal blocks in the permuted graph. We propose a gradient descent-based method to solve the proposed problem. Additionally, we develop a DBSCAN-based points traversal algorithm that identifies clusters with high densities in the graph and generates an augmented ordering of clusters. The block-diagonal structure of the graph is then achieved through permutation based on the traversal order, providing a flexible foundation for both automatic and interactive cluster analysis. We introduce a split-and-refine algorithm to automatically search for all diagonal blocks in the permuted graph with theoretically optimal guarantees under specific cases. We extensively evaluate our proposed approach on twelve challenging real-world benchmark clustering datasets and demonstrate its superior performance compared to the state-of-the-art clustering method on every dataset.
聚类分析在数据库挖掘中扮演着关键角色,该领域中最常用的算法是 DBSCAN。然而,DBSCAN 具有多个限制,例如处理高维大型数据集的困难性、对输入参数的敏感性以及产生聚类结果缺乏鲁棒性。本文介绍了一种改进的 DBSCAN 版本,它利用相似图的块状性质指导 DBSCAN 的聚类过程。关键思想是构建一个衡量高维大型数据点之间相似性的图形,具有将经过未知排列后可能变为块状的潜在可能性,然后通过聚类排序程序生成所需的排列。通过确定 permuted 图形中的块状行,可以轻松确定聚类结构。我们提出了一种基于梯度下降的方法来解决所提出的问题。此外,我们还开发了一种基于 DBSCAN 的点遍历算法,它可以在图中识别高密度聚类,并生成聚类的增强顺序。通过基于遍历顺序的排列,为自动和交互式聚类分析提供了灵活的基础。我们对特定情况下的理论最优保证下的所有 permuted 图形中的块状行进行自动搜索,并从每个数据集的大量挑战实例中进行全面评估。我们证明了所提出方法在所有数据集上的优越性能,与最先进的聚类方法相比。
https://arxiv.org/abs/2404.01341
In recent studies, line search methods have shown significant improvements in the performance of traditional stochastic gradient descent techniques, eliminating the need for a specific learning rate schedule. In this paper, we identify existing issues in state-of-the-art line search methods, propose enhancements, and rigorously evaluate their effectiveness. We test these methods on larger datasets and more complex data domains than before. Specifically, we improve the Armijo line search by integrating the momentum term from ADAM in its search direction, enabling efficient large-scale training, a task that was previously prone to failure using Armijo line search methods. Our optimization approach outperforms both the previous Armijo implementation and tuned learning rate schedules for Adam. Our evaluation focuses on Transformers and CNNs in the domains of NLP and image data. Our work is publicly available as a Python package, which provides a hyperparameter free Pytorch optimizer.
在最近的研究中,线搜索方法已经显著提高了传统随机梯度下降技术的表现,无需特定的学习率计划。在本文中,我们指出了最先进的线搜索方法的现有问题,提出了改进,并对其有效性进行了严格的评估。我们使用更大的数据集和更复杂的数据领域测试这些方法。具体来说,我们通过将ADAM中的动量项集成到搜索方向中,改进了Armijo线搜索,使得大规模训练成为可能,而这一点在使用Armijo线搜索方法时曾经容易导致失败。我们的优化方法超越了前Armijo实现和自适应学习率计划。我们的评估重点在于自然语言处理和图像数据的领域。我们的工作已公开发布为Python软件包,该软件包提供了一个不需要超参数的免费Pytorch优化器。
https://arxiv.org/abs/2403.18519
Recent works have shown that line search methods greatly increase performance of traditional stochastic gradient descent methods on a variety of datasets and architectures [1], [2]. In this work we succeed in extending line search methods to the novel and highly popular Transformer architecture and dataset domains in natural language processing. More specifically, we combine the Armijo line search with the Adam optimizer and extend it by subdividing the networks architecture into sensible units and perform the line search separately on these local units. Our optimization method outperforms the traditional Adam optimizer and achieves significant performance improvements for small data sets or small training budgets, while performing equal or better for other tested cases. Our work is publicly available as a python package, which provides a hyperparameter-free pytorch optimizer that is compatible with arbitrary network architectures.
近年来,研究表明,行搜索方法在各种数据集和架构上大大提高了传统随机梯度下降算法的性能[1],[2]。在本文中,我们在自然语言处理领域成功将行搜索方法扩展到了新颖且高度流行的Transformer架构和数据领域。具体来说,我们将Armijo行搜索与Adam优化器相结合,并将其扩展到这些局部单元上,并对这些局部单元分别进行行搜索。我们的优化方法在小型数据集或小型训练预算上表现出优异的性能,而在其他测试用例上则表现出与传统Adam优化器相当或更好的性能。我们的工作已经作为Python软件包公开发布,该软件包提供了一个与任意网络架构兼容的、无需超参数的PyTorch优化器。
https://arxiv.org/abs/2403.18506
Past decades have witnessed a great interest in the distinction and connection between neural network learning and kernel learning. Recent advancements have made theoretical progress in connecting infinite-wide neural networks and Gaussian processes. Two predominant approaches have emerged: the Neural Network Gaussian Process (NNGP) and the Neural Tangent Kernel (NTK). The former, rooted in Bayesian inference, represents a zero-order kernel, while the latter, grounded in the tangent space of gradient descents, is a first-order kernel. In this paper, we present the Unified Neural Kernel (UNK), which characterizes the learning dynamics of neural networks with gradient descents and parameter initialization. The proposed UNK kernel maintains the limiting properties of both NNGP and NTK, exhibiting behaviors akin to NTK with a finite learning step and converging to NNGP as the learning step approaches infinity. Besides, we also theoretically characterize the uniform tightness and learning convergence of the UNK kernel, providing comprehensive insights into this unified kernel. Experimental results underscore the effectiveness of our proposed method.
在过去的几十年里,对神经网络学习和核学习之间的区别和联系产生了浓厚的兴趣。最近的发展使得无限宽神经网络和 Gaussian 过程之间的理论联系更加明确。出现了两种主要方法:神经网络核函数(NNGP)和神经 tangent 核函数(NTK)。前者的根源在于贝叶斯推理,代表零阶核函数;而后者则根植于梯度下降的 tangent 空间,代表一阶核函数。在本文中,我们提出了统一神经核函数(UNK),它描述了使用梯度下降和参数初始化的神经网络的学习动态。所提出的 UNK 核函数保持 NNGP 和 NTK 的极限性质,表现出类似于 NTK 的有限学习步数和当学习步数趋近于无穷大时趋近于 NNGP 的行为。此外,我们还理论地刻画了 UNK 核的均匀紧缩性和学习收敛性,为统一的核函数提供了全面的洞察。实验结果证实了我们所提出方法的有效性。
https://arxiv.org/abs/2403.17467
Aiming to enhance the utilization of metric space by the parametric softmax classifier, recent studies suggest replacing it with a non-parametric alternative. Although a non-parametric classifier may provide better metric space utilization, it introduces the challenge of capturing inter-class relationships. A shared characteristic among prior non-parametric classifiers is the static assignment of labels to prototypes during the training, ie, each prototype consistently represents a class throughout the training course. Orthogonal to previous works, we present a simple yet effective method to optimize the category assigned to each prototype (label-to-prototype assignment) during the training. To this aim, we formalize the problem as a two-step optimization objective over network parameters and label-to-prototype assignment mapping. We solve this optimization using a sequential combination of gradient descent and Bipartide matching. We demonstrate the benefits of the proposed approach by conducting experiments on balanced and long-tail classification problems using different backbone network architectures. In particular, our method outperforms its competitors by 1.22\% accuracy on CIFAR-100, and 2.15\% on ImageNet-200 using a metric space dimension half of the size of its competitors. Code: this https URL
旨在通过参数软最大分类器的指标空间利用率,最近的研究建议用非参数分类器来代替它。尽管非参数分类器可以提供更好的指标空间利用率,但它引入了捕捉类间关系的问题。在先前的非参数分类器中,共同的特点是训练期间将标签分配给原型(即每个原型在训练过程中始终代表一个类别)。与之前的工作不同,我们提出了一个简单而有效的优化方法来优化在训练期间分配给每个原型的类别(标签-原型分配映射)。为了实现这一目标,我们将问题转化为网络参数和标签-原型分配映射的二维优化目标。我们通过梯度下降和Bipartite匹配来求解这个优化问题。我们通过使用不同骨干网络架构对平衡和长尾分类问题进行实验,证明了所提出方法的优越性。特别是,我们的方法在CIFAR-100上的准确率比其竞争对手高1.22%,而在ImageNet-200上的准确率比其竞争对手高2.15%。代码:https:// this URL
https://arxiv.org/abs/2403.16937
Effective learning in neuronal networks requires the adaptation of individual synapses given their relative contribution to solving a task. However, physical neuronal systems -- whether biological or artificial -- are constrained by spatio-temporal locality. How such networks can perform efficient credit assignment, remains, to a large extent, an open question. In Machine Learning, the answer is almost universally given by the error backpropagation algorithm, through both space (BP) and time (BPTT). However, BP(TT) is well-known to rely on biologically implausible assumptions, in particular with respect to spatiotemporal (non-)locality, while forward-propagation models such as real-time recurrent learning (RTRL) suffer from prohibitive memory constraints. We introduce Generalized Latent Equilibrium (GLE), a computational framework for fully local spatio-temporal credit assignment in physical, dynamical networks of neurons. We start by defining an energy based on neuron-local mismatches, from which we derive both neuronal dynamics via stationarity and parameter dynamics via gradient descent. The resulting dynamics can be interpreted as a real-time, biologically plausible approximation of BPTT in deep cortical networks with continuous-time neuronal dynamics and continuously active, local synaptic plasticity. In particular, GLE exploits the ability of biological neurons to phase-shift their output rate with respect to their membrane potential, which is essential in both directions of information propagation. For the forward computation, it enables the mapping of time-continuous inputs to neuronal space, performing an effective spatiotemporal convolution. For the backward computation, it permits the temporal inversion of feedback signals, which consequently approximate the adjoint states necessary for useful parameter updates.
在神经网络中实现有效的学习需要适应个体突触以解决任务,然而,物理神经网络(无论是生物还是人工)受到时空局部性的约束。如何实现这样的网络高效信用分配,仍然是一个开放性问题。在机器学习中,答案几乎被普遍认为是误差反向传播算法,无论是空间(BP)还是时间(BPTT)。然而,BP(TT)已知依赖于不适用于生物的假设,特别是与时空非局部性有关的假设,而像实时循环学习(RTRL)这样的前馈网络则受到记忆约束的困扰。我们引入了泛化拉格朗日均衡(GLE),一种计算物理、动态神经元网络中完全局部时空间信用分配的计算框架。我们首先定义一个基于神经元局部不匹配的势能,然后通过稳态和参数下降从该势能导出神经元动力学。由此产生的动力学可以解释为在具有连续时间神经元动态和连续活动局部突触的深度皮质网络中,BPTT的生物合理近似。特别是,GLE利用了生物神经元相对于膜电位调整输出率的能力,这对于信息传播的 both 方向 都是至关重要的。对于前馈计算,它实现了将时间连续输入映射到神经元空间,并执行有效的时空间卷积。对于后馈计算,它允许时域反馈信号的时域反演,从而实现有用的参数更新所需的伴随状态近似。
https://arxiv.org/abs/2403.16933