As Machine Learning (ML) is increasingly used in solving various tasks in real-world applications, it is crucial to ensure that ML algorithms are robust to any potential worst-case noises, adversarial attacks, and highly unusual situations when they are designed. Studying ML robustness will significantly help in the design of ML algorithms. In this paper, we investigate ML robustness using adversarial training in centralized and decentralized environments, where ML training and testing are conducted in one or multiple computers. In the centralized environment, we achieve a test accuracy of 65.41% and 83.0% when classifying adversarial examples generated by Fast Gradient Sign Method and DeepFool, respectively. Comparing to existing studies, these results demonstrate an improvement of 18.41% for FGSM and 47% for DeepFool. In the decentralized environment, we study Federated learning (FL) robustness by using adversarial training with independent and identically distributed (IID) and non-IID data, respectively, where CIFAR-10 is used in this research. In the IID data case, our experimental results demonstrate that we can achieve such a robust accuracy that it is comparable to the one obtained in the centralized environment. Moreover, in the non-IID data case, the natural accuracy drops from 66.23% to 57.82%, and the robust accuracy decreases by 25% and 23.4% in C&W and Projected Gradient Descent (PGD) attacks, compared to the IID data case, respectively. We further propose an IID data-sharing approach, which allows for increasing the natural accuracy to 85.04% and the robust accuracy from 57% to 72% in C&W attacks and from 59% to 67% in PGD attacks.
机器学习(ML)在实际应用中越来越用于解决各种任务,因此确保机器学习算法在设计和开发时能够抵御潜在的最佳情况噪声、对抗攻击和非常规情况是至关重要的。研究机器学习鲁棒性将极大地有助于机器学习算法的设计。在本文中,我们使用对抗训练在集中式和分布式环境中研究机器学习鲁棒性。在集中式环境中,我们使用 Fast Gradient Sign Method 和 DeepFool 生成的对抗样本进行分类,分别得出测试准确率为65.41%和83.0%。与现有研究相比,这些结果表明 FGSM 和 DeepFool 的测试准确率都有显著提高。在分布式环境中,我们使用独立和同分布(IID)数据和非 IID 数据分别研究联邦学习(FL)的鲁棒性,其中 CIFAR-10 是本研究所使用的数据集。在 IID 数据集中,我们的实验结果表明,我们能够实现类似于集中式环境所获得的鲁棒精度。此外,在非 IID 数据集中,自然精度从66.23%降至57.82%,而对抗精度从25%降至23.4%,C&W 和 projected梯度下降(PGD)攻击相对于 IID 数据集中的抗攻击精度下降了25%和23.4%。我们还提出了 IID 数据共享方法,该方法允许将自然精度提高至85.04%,C&W 攻击的抗攻击精度从57%至72%,而 PGD 攻击的抗攻击精度从59%至67%。
https://arxiv.org/abs/2309.12593
Existing time-resolved non-line-of-sight (NLOS) imaging methods reconstruct hidden scenes by inverting the optical paths of indirect illumination measured at visible relay surfaces. These methods are prone to reconstruction artifacts due to inversion ambiguities and capture noise, which are typically mitigated through the manual selection of filtering functions and parameters. We introduce a fully-differentiable end-to-end NLOS inverse rendering pipeline that self-calibrates the imaging parameters during the reconstruction of hidden scenes, using as input only the measured illumination while working both in the time and frequency domains. Our pipeline extracts a geometric representation of the hidden scene from NLOS volumetric intensities and estimates the time-resolved illumination at the relay wall produced by such geometric information using differentiable transient rendering. We then use gradient descent to optimize imaging parameters by minimizing the error between our simulated time-resolved illumination and the measured illumination. Our end-to-end differentiable pipeline couples diffraction-based volumetric NLOS reconstruction with path-space light transport and a simple ray marching technique to extract detailed, dense sets of surface points and normals of hidden scenes. We demonstrate the robustness of our method to consistently reconstruct geometry and albedo, even under significant noise levels.
现有的时间分辨率非可见光(NLOS)成像方法通过反转间接照明在可见Relay表面上测量的光学路径来重构隐藏场景。这些方法由于反转混淆和捕获噪声而容易出现重建错误,通常可以通过手动选择滤波函数和参数来减轻。我们引入了一个全互变的夜晚 NLOS 反渲染管道,在重构隐藏场景时自我校准成像参数,仅使用测量的照明作为输入,同时工作在时间和频率 domains。该管道从 NLOS 体积强度中提取隐藏场景的几何表示,使用不同的临时渲染方法估计由这种几何信息产生的在Relay壁表面的时间分辨率照明。然后我们使用梯度下降算法优化成像参数,最小化我们的模拟时间分辨率照明与测量照明之间的误差。我们的全互变管道将基于散射的体积 NLOS 重建与路径空间光传输结合,使用简单的ray marching技术提取隐藏场景表面的详细、密集一组表面点和法向量。我们证明了我们的方法能够 consistently 重构几何和反射率,即使在非常高的噪声水平下也是如此。
https://arxiv.org/abs/2309.12047
Disease progression simulation is a crucial area of research that has significant implications for clinical diagnosis, prognosis, and treatment. One major challenge in this field is the lack of continuous medical imaging monitoring of individual patients over time. To address this issue, we develop a novel framework termed Progressive Image Editing (PIE) that enables controlled manipulation of disease-related image features, facilitating precise and realistic disease progression simulation. Specifically, we leverage recent advancements in text-to-image generative models to simulate disease progression accurately and personalize it for each patient. We theoretically analyze the iterative refining process in our framework as a gradient descent with an exponentially decayed learning rate. To validate our framework, we conduct experiments in three medical imaging domains. Our results demonstrate the superiority of PIE over existing methods such as Stable Diffusion Walk and Style-Based Manifold Extrapolation based on CLIP score (Realism) and Disease Classification Confidence (Alignment). Our user study collected feedback from 35 veteran physicians to assess the generated progressions. Remarkably, 76.2% of the feedback agrees with the fidelity of the generated progressions. To our best knowledge, PIE is the first of its kind to generate disease progression images meeting real-world standards. It is a promising tool for medical research and clinical practice, potentially allowing healthcare providers to model disease trajectories over time, predict future treatment responses, and improve patient outcomes.
疾病进展模拟是一个关键的研究领域,它对临床诊断、预后和治疗具有重大的影响。该领域的一个主要挑战是缺乏对个体患者定期医学影像监测的时间持续存在。为了解决这个问题,我们开发了一种名为 progressive image editing (PIE) 的新框架,它能够实现控制的疾病相关图像特征的操纵,便于精确和真实的疾病进展模拟。具体来说,我们利用最近的文本到图像生成模型的进步,准确模拟疾病进展,并为每个患者个性化定制。我们从理论上分析我们的迭代改进过程,将其视为一个以指数衰减的学习率的梯度下降过程。为了验证我们的框架,我们进行了三个医学影像领域的实验。我们的实验结果证明,PIE 比现有的方法如稳定扩散漫步和基于CLIP得分(真实感)和疾病分类信心(对齐)的方法优越。我们的用户研究从35名经验丰富的医生收集了反馈,以评估生成的进展。令人惊讶地,76.2% 的反馈与生成的进展精度一致。据我们所知,PIE 是生成符合实际标准的疾病进展图像的第一种方法。它对于医学研究和临床实践是一个有前途的工具,可能允许医疗保健提供者模拟时间轨迹,预测未来的治疗反应,并改善患者的治疗效果。
https://arxiv.org/abs/2309.11745
This paper presents a novel Stochastic Optimal Control (SOC) method based on Model Predictive Path Integral control (MPPI), named Stein Variational Guided MPPI (SVG-MPPI), designed to handle rapidly shifting multimodal optimal action distributions. While MPPI can find a Gaussian-approximated optimal action distribution in closed form, i.e., without iterative solution updates, it struggles with multimodality of the optimal distributions, such as those involving non-convex constraints for obstacle avoidance. This is due to the less representative nature of the Gaussian. To overcome this limitation, our method aims to identify a target mode of the optimal distribution and guide the solution to converge to fit it. In the proposed method, the target mode is roughly estimated using a modified Stein Variational Gradient Descent (SVGD) method and embedded into the MPPI algorithm to find a closed-form "mode-seeking" solution that covers only the target mode, thus preserving the fast convergence property of MPPI. Our simulation and real-world experimental results demonstrate that SVG-MPPI outperforms both the original MPPI and other state-of-the-art sampling-based SOC algorithms in terms of path-tracking and obstacle-avoidance capabilities. Source code: this https URL
本文提出了基于模型预测路径积分控制(MPPI)的新型随机最优控制(SOC)方法,称为 Stein Variational Guided MPPI(SVG-MPPI),旨在快速转换多模式最优行动分布。虽然 MPPI 可以在闭式形式中找到Gaussian近似的最优行动分布,即不需要迭代的解决方案更新,但它与最优分布的多模式性质,例如涉及避免障碍物的非凸约束,产生了困难。这是因为Gaussian 的代表性较弱。为了克服这一限制,我们的方法旨在确定最优分布的目标模式,并 guide 解决方案收敛到适应它。在 proposed 方法中,目标模式是通过修改 Stein Variational 梯度下降(SVGD)方法大致估计的,并嵌入到 MPPI 算法中,以找到覆盖目标模式的闭式“模式搜索”解决方案,从而保留了 MPPI 的快速收敛特性。我们的模拟和现实世界实验结果表明,SVG-MPPI 在路径跟踪和避免障碍物的能力方面优于原始的 MPPI 和其他先进的采样-basedSOC算法。源代码: this https URL
https://arxiv.org/abs/2309.11040
Medical Imaging (MI) tasks, such as accelerated Parallel Magnetic Resonance Imaging (MRI), often involve reconstructing an image from noisy or incomplete measurements. This amounts to solving ill-posed inverse problems, where a satisfactory closed-form analytical solution is not available. Traditional methods such as Compressed Sensing (CS) in MRI reconstruction can be time-consuming or prone to obtaining low-fidelity images. Recently, a plethora of supervised and self-supervised Deep Learning (DL) approaches have demonstrated superior performance in inverse-problem solving, surpassing conventional methods. In this study, we propose vSHARP (variable Splitting Half-quadratic ADMM algorithm for Reconstruction of inverse Problems), a novel DL-based method for solving ill-posed inverse problems arising in MI. vSHARP utilizes the Half-Quadratic Variable Splitting method and employs the Alternating Direction Method of Multipliers (ADMM) to unroll the optimization process. For data consistency, vSHARP unrolls a differentiable gradient descent process in the image domain, while a DL-based denoiser, such as a U-Net architecture, is applied to enhance image quality. vSHARP also employs a dilated-convolution DL-based model to predict the Lagrange multipliers for the ADMM initialization. We evaluate the proposed model by applying it to the task of accelerated Parallel MRI Reconstruction on two distinct datasets. We present a comparative analysis of our experimental results with state-of-the-art approaches, highlighting the superior performance of vSHARP.
医学影像(MI)的任务,例如加速并行磁共振成像(MRI),通常涉及从噪声或不完整测量中重构图像。这相当于解决缺乏满意的开括号反问题,没有满意的 closed-form analytical solution。传统的方法,例如在MRI重建中的压缩感知(CS),可能会浪费时间或导致低质量图像。最近,大量的监督和自监督深度学习方法已经证明了在反问题求解方面的优越性能,超越了传统方法。在本研究中,我们提出了 vSHARP(变量分割半quadratic ADMM算法,用于重构反问题),这是一种基于深度学习的方法,解决MI中产生的 ill-posed 反问题。vSHARP利用半quadratic变量分割方法和使用代换方向Multipliers(ADMM)展开优化过程。为了数据一致性,vSHARP在图像域展开一种不同的梯度下降过程,同时使用基于深度学习的denoiser,例如 U-Net架构,来提高图像质量。vSHARP还使用扩展卷积深度学习模型,预测ADMM初始化的拉格朗日乘子。我们使用它评估了提出的模型,将其应用于两个不同的数据集,加速并行MRI重建任务。我们呈现了我们的实验结果与当前方法的比较分析,突出了vSHARP的优越性能。
https://arxiv.org/abs/2309.09954
This paper introduces a differentiable representation for optimization of boustrophedon path plans in convex polygons, explores an additional parameter of these path plans that can be optimized, discusses the properties of this representation that can be leveraged during the optimization process, and shows that the previously published attempt at optimization of these path plans was too coarse to be practically useful. Experiments were conducted to show that this differentiable representation can reproduce the same scores from transitional discrete representations of boustrophedon path plans with high fidelity. Finally, optimization via gradient descent was attempted, but found to fail because the search space is far more non-convex than was previously considered in the literature. The wide range of applications for boustrophedon path plans means that this work has the potential to improve path planning efficiency in numerous areas of robotics including mapping and search tasks using uncrewed aerial systems, environmental sampling tasks using uncrewed marine vehicles, and agricultural tasks using ground vehicles, among numerous others applications.
本论文介绍了一种可区分的表示方法,用于优化凸多边形中的鲍斯崔伯路径计划,探索了这些路径计划的可优化额外参数,讨论了这种表示在优化过程中可以利用的特性,并证明了之前发表的优化这些路径计划尝试太过粗糙,不实际有用。进行了实验,以证明这种可区分的表示可以高保真的模拟过渡离散表示的鲍斯崔伯路径计划得分。最后,尝试通过梯度下降进行优化,但发现由于搜索空间比文献中之前考虑得更非凸。鲍斯崔伯路径计划的广泛应用意味着这项工作有潜力改善机器人许多领域的路径计划效率,包括使用无脊椎无人机进行地图和搜索任务、使用无脊椎海洋车辆进行环境采样任务、以及使用地面车辆进行农业任务等许多应用。
https://arxiv.org/abs/2309.09882
Deep learning models have achieved state-of-the-art performances in various domains, while they are vulnerable to the inputs with well-crafted but small perturbations, which are named after adversarial examples (AEs). Among many strategies to improve the model robustness against AEs, Projected Gradient Descent (PGD) based adversarial training is one of the most effective methods. Unfortunately, the prohibitive computational overhead of generating strong enough AEs, due to the maximization of the loss function, sometimes makes the regular PGD adversarial training impractical when using larger and more complicated models. In this paper, we propose that the adversarial loss can be approximated by the partial sum of Taylor series. Furthermore, we approximate the gradient of adversarial loss and propose a new and efficient adversarial training method, adversarial training with gradient approximation (GAAT), to reduce the cost of building up robust models. Additionally, extensive experiments demonstrate that this efficiency improvement can be achieved without any or with very little loss in accuracy on natural and adversarial examples, which show that our proposed method saves up to 60\% of the training time with comparable model test accuracy on MNIST, CIFAR-10 and CIFAR-100 datasets.
深度学习模型在多个领域取得了最先进的性能,但它们对某些构造良好但微小的扰动输入具有一定的脆弱性,这些输入被称为对抗性例子(AEs)。在众多方法中,基于梯度下降的对抗训练是最有效的一种方法之一。不幸的是,生成足够强大的AE的计算成本由于损失函数的最大化而过高,有时在使用更大更复杂的模型时, regular PGD对抗训练 practical。在本文中,我们提出,对抗损失可以近似为泰勒级数的 partial sum。我们还近似了对抗损失的梯度,并提出了一种新的高效的对抗训练方法,称为对抗训练与梯度近似(GAAT),以降低建立稳健模型的成本。此外,广泛的实验表明,在自然和对抗性例子上,这种方法的效率改进可以在没有任何或几乎没有任何精度损失的情况下实现,这表明我们提出的这种方法可以节省高达60%的训练时间,在米NIST、CIFAR-10和CIFAR-100数据集上与相似模型测试精度相当。
https://arxiv.org/abs/2309.09464
Recently, data-driven techniques have demonstrated remarkable effectiveness in addressing challenges related to MR imaging inverse problems. However, these methods still exhibit certain limitations in terms of interpretability and robustness. In response, we introduce Convex Latent-Optimized Adversarial Regularizers (CLEAR), a novel and interpretable data-driven paradigm. CLEAR represents a fusion of deep learning (DL) and variational regularization. Specifically, we employ a latent optimization technique to adversarially train an input convex neural network, and its set of minima can fully represent the real data manifold. We utilize it as a convex regularizer to formulate a CLEAR-informed variational regularization model that guides the solution of the imaging inverse problem on the real data manifold. Leveraging its inherent convexity, we have established the convergence of the projected subgradient descent algorithm for the CLEAR-informed regularization model. This convergence guarantees the attainment of a unique solution to the imaging inverse problem, subject to certain assumptions. Furthermore, we have demonstrated the robustness of our CLEAR-informed model, explicitly showcasing its capacity to achieve stable reconstruction even in the presence of measurement interference. Finally, we illustrate the superiority of our approach using MRI reconstruction as an example. Our method consistently outperforms conventional data-driven techniques and traditional regularization approaches, excelling in both reconstruction quality and robustness.
最近,数据驱动方法在解决与核磁共振成像反问题相关的挑战方面表现出了显著的效力。然而,这些方法在解释性和稳定性方面仍存在一些限制。因此,我们引入了Convex Latent-Optimized Adversarial Regularizer(Clear),这是一个新颖且可解释的数据驱动范式。Clear代表了深度学习(DL)和Variational Regularization 的合并。具体而言,我们采用潜在优化技术对输入的凸神经网络进行dversarial训练,其一系列最小值可以完全代表真实的数据集。我们利用它作为凸 Regularizer 来构建Clear- informedVariational Regularization模型,该模型指导真实数据集上的成像反问题的解决方案。利用其内在的凸性,我们建立了Clear- informed Regularization模型的 projected subgradient descent算法的收敛。 this 收敛保证了在一定的假设条件下,达到成像反问题的唯一解决方案。此外,我们展示了我们的Clear- informed模型的稳健性,明确展示了它在存在测量干扰的情况下保持稳定重建的能力。最后,我们使用MRI重建作为例子展示了我们的方法的优势。我们的方法 consistently outperforms traditional data-driven techniques and traditional regularization approaches,在重建质量和稳健性方面都表现出色。
https://arxiv.org/abs/2309.09250
In image retrieval, standard evaluation metrics rely on score ranking, \eg average precision (AP), recall at k (R@k), normalized discounted cumulative gain (NDCG). In this work we introduce a general framework for robust and decomposable rank losses optimization. It addresses two major challenges for end-to-end training of deep neural networks with rank losses: non-differentiability and non-decomposability. Firstly we propose a general surrogate for ranking operator, SupRank, that is amenable to stochastic gradient descent. It provides an upperbound for rank losses and ensures robust training. Secondly, we use a simple yet effective loss function to reduce the decomposability gap between the averaged batch approximation of ranking losses and their values on the whole training set. We apply our framework to two standard metrics for image retrieval: AP and R@k. Additionally we apply our framework to hierarchical image retrieval. We introduce an extension of AP, the hierarchical average precision $\mathcal{H}$-AP, and optimize it as well as the NDCG. Finally we create the first hierarchical landmarks retrieval dataset. We use a semi-automatic pipeline to create hierarchical labels, extending the large scale Google Landmarks v2 dataset. The hierarchical dataset is publicly available at this https URL. Code will be released at this https URL.
在图像检索中,标准评估指标依赖于评分排序,例如平均精度(AP)、k点召回(R@k)、正则化累加总 gain(NDCG)。在本研究中,我们提出了一个通用框架,用于 robust 和可分解的排序损失的优化。它解决了深度学习网络中排序损失导致的 end-to-end 训练的两个主要挑战:不可变性和非分解性。首先,我们提出了一种通用替代排序操作 SupRank,使其适用于随机梯度下降。它提供了排序损失的上界,并确保了稳健的训练。其次,我们使用一个简单的但有效的损失函数,以减少平均批量近似排序损失和整个训练集值之间的分解性差距。我们将其应用于图像检索的两个标准指标:AP 和 R@k。此外,我们还将其应用于Hierarchical 图像检索。我们引入了 AP 的扩展,即Hierarchical 平均精度 $\mathcal{H}$-AP,并优化其和NDCG。最后,我们创造了第一个Hierarchical 地标检索数据集。我们使用半自动流程创建Hierarchical 标签,扩展了大规模的谷歌地标数据集。Hierarchical 数据集在此 https URL 上公开可用。代码将在此 https URL 上发布。
https://arxiv.org/abs/2309.08250
This paper investigates the in-context learning abilities of the Whisper automatic speech recognition (ASR) models released by OpenAI. A novel speech-based in-context learning (SICL) approach is proposed for test-time adaptation, which can reduce the word error rates (WERs) with only a small number of labelled speech samples without gradient descent. Language-level adaptation experiments using Chinese dialects showed that when applying SICL to isolated word ASR, consistent and considerable relative WER reductions can be achieved using Whisper models of any size on two dialects, which is on average 32.3%. A k-nearest-neighbours-based in-context example selection technique can be applied to further improve the efficiency of SICL, which can increase the average relative WER reduction to 36.4%. The findings are verified using speaker adaptation or continuous speech recognition tasks, and both achieved considerable relative WER reductions. Detailed quantitative analyses are also provided to shed light on SICL's adaptability to phonological variances and dialect-specific lexical nuances.
本论文研究了由 OpenAI 发布的whisper自动语音识别(ASR)模型在上下文学习方面的能力。提出了一种基于语音的上下文学习(SICL)方法来进行测试时的适应,这种方法可以在仅有少量标注语音样本的情况下减少单词错误率(WER)。使用中国方言的语音水平适应实验表明,将SICL应用于单个单词的语音识别时,可以使用任何大小的语言whisper模型,平均减少相对WER为32.3%。使用k-最近的邻居基于上下文选择技巧可以进一步提高SICL的效率,可以将平均相对WER减少到36.4%。使用演讲适应或连续语音识别任务来验证发现, both WER降低量很大。此外,还提供了详细的量化分析,以说明SICL对语音差异和方言特定词汇调的适应性。
https://arxiv.org/abs/2309.07081
This paper addresses multi-robot informative path planning (IPP) for environmental monitoring. The problem involves determining informative regions in the environment that should be visited by robots in order to gather the most information about the environment. We propose an efficient sparse Gaussian process-based approach that uses gradient descent to optimize paths in continuous environments. Our approach efficiently scales to both spatially and spatio-temporally correlated environments. Moreover, our approach can simultaneously optimize the informative paths while accounting for routing constraints, such as a distance budget and limits on the robot's velocity and acceleration. Our approach can be used for IPP with both discrete and continuous sensing robots, with point and non-point field-of-view sensing shapes, and for multi-robot IPP. The proposed approach is demonstrated to be fast and accurate on real-world data.
本论文针对多机器人的环境信息规划(IPP)进行探讨,旨在用于环境监测。该问题涉及确定环境中存在的 informative 区域,以收集环境最全面的信息的机器人访问。我们提出了一种高效的稀疏高斯过程方法,该方法使用梯度下降在连续环境中优化路径。我们的方法高效地适用于空间和时间相关的环境。此外,我们的方法可以同时优化 informative 路径,并考虑路由限制,例如距离预算和机器人速度和加速度的限制。我们的方法可以用于离散和连续感知机器人的 IPP,以及多机器人的 IPP。该提出的方法在真实数据上展示了快速和准确的性能。
https://arxiv.org/abs/2309.07050
Accurate disturbance estimation is essential for safe robot operations. The recently proposed neural moving horizon estimation (NeuroMHE), which uses a portable neural network to model the MHE's weightings, shows promise in this context. Currently, NeuroMHE is trained through gradient descent, with its gradient computed recursively using a Kalman filter. This paper proposes a trust-region policy optimization method for training NeuroMHE. We achieve this by providing the second-order derivatives of MHE, referred to as the MHE Hessian. Remarkably, we establish that much of computation already used to obtain the gradient, especially the Kalman filter, can be efficiently reused to compute the MHE Hessian. This offers linear computational complexity relative to the MHE horizon. Through validation with an open-source real quadrotor flight dataset, our approach demonstrates data-efficient training (<5 min) and outperforms a state-of-the-art neural estimator by up to 68.1% in force estimation accuracy, utilizing only 1.4% of its network parameters. Furthermore, our method showcases enhanced robustness to network initialization compared to the gradient descent counterpart.
准确的干扰估计对于安全机器人操作至关重要。最近提出的神经网络移动 horizon 估计(NeuroMHE)使用便携式神经网络模型来估计MHE的权重,在这方面具有潜力。目前,NeuroMHE 通过梯度下降训练,其梯度通过 Kalman 滤波进行递归计算。本文提出了一种信任区域策略优化方法来训练 NeuroMHE。我们实现这种方法,通过提供MHE的第二阶导数,称为MHE哈希。值得注意的是,我们证明,已经用于获取梯度的计算量,特别是 Kalman 滤波,可以高效地用于计算MHE哈希。这相对于MHE horizon 具有线性计算复杂性。通过与开源的真实四轴飞行器飞行数据集进行验证,我们的方法证明了高效的训练(<5分钟),并在力估计精度方面比最先进的神经网络估计器高出68.1%。此外,我们的方法展示了与梯度下降相比,网络初始化的增强鲁棒性。
https://arxiv.org/abs/2309.05955
Large Language Models (LLMs) have proven their exceptional capabilities in performing language-related tasks. However, their deployment poses significant challenges due to their considerable memory and storage requirements. In response to this issue, weight-only quantization, particularly 3 and 4-bit weight-only quantization, has emerged as one of the most viable solutions. As the number of bits decreases, the quantization grid broadens, thus emphasizing the importance of up and down rounding. While previous studies have demonstrated that fine-tuning up and down rounding with the addition of perturbations can enhance accuracy in some scenarios, our study is driven by the precise and limited boundary of these perturbations, where only the threshold for altering the rounding value is of significance. Consequently, we propose a concise and highly effective approach for optimizing the weight rounding task. Our method, named SignRound, involves lightweight block-wise tuning using signed gradient descent, enabling us to achieve outstanding results within 400 steps. SignRound outperforms the established baseline of rounding-to-nearest (RTN) and competes impressively against recent methods, without introducing additional inference overhead. The source code will be publicly available at this https URL soon.
大型语言模型(LLM)已经在执行语言相关任务方面证明了其卓越的能力。然而,由于它们巨大的内存和存储要求,它们的部署面临着巨大的挑战。为了应对这个问题,仅重量量化,特别是3和4位重量量化,已成为可行的解决方案之一。随着比特数的减少,量化网格也会扩大,从而强调了精度向上向下Rounding的重要性。虽然过去的研究表明,在添加扰动的情况下,调整向上向下Rounding可以在某些情况下提高精度,但我们的研究基于这些扰动精确且有限的边界,只有改变Rounding值的阈值才具有重要意义。因此,我们提出了一种简洁而高效的方法,以优化重量Rounding任务。我们的方法名为SignRound,采用 signed 梯度下降轻量级块优化,使我们能够在400步内取得卓越的结果。SignRound比 established baseline ofRound-to-nearest(RTN) 表现更好,并令人印象深刻地与最近的方法竞争,而无需引入额外的推理 overhead。源代码将在不久的将来在这个 https URL 上公开。
https://arxiv.org/abs/2309.05516
In many imaging applications where segmented features (e.g. blood vessels) are further used for other numerical simulations (e.g. finite element analysis), the obtained surfaces do not have fine resolutions suitable for the task. Increasing the resolution of such surfaces becomes crucial. This paper proposes a new variational model for solving this problem, based on an Euler-Elastica-based regulariser. Further, we propose and implement two numerical algorithms for solving the model, a projected gradient descent method and the alternating direction method of multipliers. Numerical experiments using real-life examples (including two from outputs of another variational model) have been illustrated for effectiveness. The advantages of the new model are shown through quantitative comparisons by the standard deviation of Gaussian curvatures and mean curvatures from the viewpoint of discrete geometry.
在许多成像应用中,将分割的特征(例如血管)用于其他数值模拟任务(例如有限元分析),所获得的表面并没有适合任务的高分辨率。增加这些表面的分辨率变得至关重要。本文提出了基于欧拉Elastica的 Regulariser 的新Variational 模型来解决这个问题。此外,我们提出了和实现了两个用于解决模型的 numerical 算法,一个 projected Gradient Descent 方法和 multiplier 的交替方向方法。使用实际示例(包括另一个Variational 模型的输出的两个例子)进行了有效性的实证。通过离散几何视角的Gaussian 曲率和均值曲率的标准差进行 quantitative 比较,展示了新模型的优势。
https://arxiv.org/abs/2309.05071
Physics-informed Neural Network (PINN) is one of the most preeminent solvers of Navier-Stokes equations, which are widely used as the governing equation of blood flow. However, current approaches, relying on full Navier-Stokes equations, are impractical for ultrafast Doppler ultrasound, the state-of-the-art technique for depiction of complex blood flow dynamics \emph{in vivo} through acquired thousands of frames (or, timestamps) per second. In this article, we first propose a novel training framework of PINN for solving Navier-Stokes equations by discretizing Navier-Stokes equations into steady state and sequentially solving steady-state Navier-Stokes equations with transfer learning. The novel training framework is coined as SeqPINN. Upon the success of SeqPINN, we adopt the idea of averaged constant stochastic gradient descent (SGD) as initialization and propose a parallel training scheme for all timestamps. To ensure an initialization that generalizes well, we borrow the concept of Stochastic Weight Averaging Gaussian to perform uncertainty estimation as an indicator of generalizability of the initialization. This algorithm, named SP-PINN, further expedites training of PINN while achieving comparable accuracy with SeqPINN. Finite-element simulations and \emph{in vitro} phantoms of single-branch and trifurcate blood vessels are used to evaluate the performance of SeqPINN and SP-PINN. Results show that both SeqPINN and SP-PINN are manyfold faster than the original design of PINN, while respectively achieving Root Mean Square Errors (RMSEs) of 1.01 cm/s and 1.26 cm/s on the straight vessel and 1.91 cm/s and 2.56 cm/s on the trifurcate blood vessel when recovering blood flow velocities.
物理 informed 神经网络(PINN)是解决纳维-斯托克斯方程的最著名的解决者之一,这些方程被广泛用作控制血流的方程。然而,目前的方法依赖于完整的纳维-斯托克斯方程,对于快速动态声学超声,这是描述复杂血流动力学 \emph{在 vivo} 的最佳技术,通过每秒收集 thousands 帧(或,时间戳)来实现。在本文中,我们首先提出了PINN的一个 novel 训练框架,以解决纳维-斯托克斯方程,通过离散化纳维-斯托克斯方程到稳定状态,然后使用迁移学习Sequential解决稳定状态的纳维-斯托克斯方程。该 novel 训练框架被称为 SeqPINN。在 SeqPINN 成功之后,我们采用平均常数随机梯度下降(SGD)作为初始化,并提出了所有时间戳的并行训练计划。为了确保初始化可以泛化良好,我们借用随机权重平均Gaussian的概念进行不确定性估计,以作为初始化的泛化性的指标。这个算法被称为 SP-PINN,进一步加速了 PINN 的训练,同时与 SeqPINN 实现类似的精度。有限元模拟和单分支和三分支的血管模拟用于评估 SeqPINN 和 SP-PINN 的性能。结果显示,SeqPINN 和 SP-PINN 比 PINN 的原始设计更快,在恢复血流速度时,分别实现了 1.01 厘米/秒和 1.26 厘米/秒的RMSEs 在直血管上,1.91 厘米/秒和 2.56 厘米/秒在三分支血管上。
https://arxiv.org/abs/2309.04755
Artificial neural networks have revolutionized machine learning in recent years, but a complete theoretical framework for their learning process is still lacking. Substantial progress has been made for infinitely wide networks. In this regime, two disparate theoretical frameworks have been used, in which the network's output is described using kernels: one framework is based on the Neural Tangent Kernel (NTK) which assumes linearized gradient descent dynamics, while the Neural Network Gaussian Process (NNGP) kernel assumes a Bayesian framework. However, the relation between these two frameworks has remained elusive. This work unifies these two distinct theories using a Markov proximal learning model for learning dynamics in an ensemble of randomly initialized infinitely wide deep networks. We derive an exact analytical expression for the network input-output function during and after learning, and introduce a new time-dependent Neural Dynamical Kernel (NDK) from which both NTK and NNGP kernels can be derived. We identify two learning phases characterized by different time scales: gradient-driven and diffusive learning. In the initial gradient-driven learning phase, the dynamics is dominated by deterministic gradient descent, and is described by the NTK theory. This phase is followed by the diffusive learning stage, during which the network parameters sample the solution space, ultimately approaching the equilibrium distribution corresponding to NNGP. Combined with numerical evaluations on synthetic and benchmark datasets, we provide novel insights into the different roles of initialization, regularization, and network depth, as well as phenomena such as early stopping and representational drift. This work closes the gap between the NTK and NNGP theories, providing a comprehensive framework for understanding the learning process of deep neural networks in the infinite width limit.
神经网络近年来已经彻底改变了机器学习,但是它们的学习过程的完整理论框架仍然缺乏。对于无限 wide 网络取得了显著的进展。在这个模式下,两个截然不同的理论框架被使用,其中网络的输出使用内核描述:一个框架基于神经网络 Tangent Kernel (NTK),该假设线性梯度下降动态,另一个框架基于神经网络高斯过程(NNGP)内核,该假设概率框架。然而,这两个框架之间的关系仍然不得而知。这项工作使用马氏接近学习模型,在一个随机初始化的无限 wide 深神经网络集群中学习动态。我们推导了网络输入输出函数在学习过程中的精确 analytical 表达式,并引入了一个新的时间依赖性神经网络动态内核(NDK),可以从 NTK 和 NNGP 内核中推导出来。我们识别了两个具有不同时间尺度的学习阶段:梯度驱动和扩散学习。在初始梯度驱动学习阶段,动态主要由确定性梯度下降主导,由 NTK 理论描述。此阶段随后是扩散学习阶段,其中网络参数采样解空间,最终接近与 NNGP 相应的均衡分布。结合对合成和基准数据集的数值评估,我们提供了新的见解,以初始化、 Regularization 和网络深度的不同作用,以及例如过早停止和表示漂移等现象。这项工作结束了 NTK 和 NNGP 理论之间的差异,提供了理解无限宽度极限深度神经网络学习过程的全面框架。
https://arxiv.org/abs/2309.04522
This article describes a multi-modal method using simulated Lidar data via ray tracing and image pixel loss with differentiable rendering to optimize an object's position with respect to an observer or some referential objects in a computer graphics scene. Object position optimization is completed using gradient descent with the loss function being influenced by both modalities. Typical object placement optimization is done using image pixel loss with differentiable rendering only, this work shows the use of a second modality (Lidar) leads to faster convergence. This method of fusing sensor input presents a potential usefulness for autonomous vehicles, as these methods can be used to establish the locations of multiple actors in a scene. This article also presents a method for the simulation of multiple types of data to be used in the training of autonomous vehicles.
这篇文章描述了一种利用模拟的激光雷达数据通过渲染和像素损失优化物体相对于观察者或一些参考物体的位置的多模态方法。物体位置优化使用梯度下降算法完成,同时损失函数受到两种模态的影响。典型的物体位置优化仅使用像素损失和可区分渲染完成,这项工作表明使用第二种模态(激光雷达)可以加快收敛。这种方法整合传感器输入具有对自动驾驶车辆的潜在用处,因为这些方法可以用于确定场景中多个行动者的位置。本文还介绍了一种用于模拟多种数据类型,用于自动驾驶车辆训练的方法。
https://arxiv.org/abs/2309.03177
We consider the problem of learning a function respecting a symmetry from among a class of symmetries. We develop a unified framework that enables symmetry discovery across a broad range of subgroups including locally symmetric, dihedral and cyclic subgroups. At the core of the framework is a novel architecture composed of linear and tensor-valued functions that expresses functions invariant to these subgroups in a principled manner. The structure of the architecture enables us to leverage multi-armed bandit algorithms and gradient descent to efficiently optimize over the linear and the tensor-valued functions, respectively, and to infer the symmetry that is ultimately learnt. We also discuss the necessity of the tensor-valued functions in the architecture. Experiments on image-digit sum and polynomial regression tasks demonstrate the effectiveness of our approach.
我们考虑学习一个尊重对称性的特殊函数的问题。我们开发了一个统一框架,该框架能够发现包括局部对称、双曲和循环对称等广泛分组内的特殊对称性。框架的核心是一个由线性和矩阵值函数组成的新型架构,以 principled 的方式表达这些分组内不变的函数。架构的结构使我们能够利用多臂赌博算法和梯度下降高效优化线性和矩阵值函数,并推断最终学习的对称性。我们还讨论了矩阵值函数在架构中的作用。对图像数字总和和多项式回归任务的实验表明我们的方法的有效性。
https://arxiv.org/abs/2309.02898
The distributed data analytic system -- Spark is a common choice for processing massive volumes of heterogeneous data, while it is challenging to tune its parameters to achieve high performance. Recent studies try to employ auto-tuning techniques to solve this problem but suffer from three issues: limited functionality, high overhead, and inefficient search. In this paper, we present a general and efficient Spark tuning framework that can deal with the three issues simultaneously. First, we introduce a generalized tuning formulation, which can support multiple tuning goals and constraints conveniently, and a Bayesian optimization (BO) based solution to solve this generalized optimization problem. Second, to avoid high overhead from additional offline evaluations in existing methods, we propose to tune parameters along with the actual periodic executions of each job (i.e., online evaluations). To ensure safety during online job executions, we design a safe configuration acquisition method that models the safe region. Finally, three innovative techniques are leveraged to further accelerate the search process: adaptive sub-space generation, approximate gradient descent, and meta-learning method. We have implemented this framework as an independent cloud service, and applied it to the data platform in Tencent. The empirical results on both public benchmarks and large-scale production tasks demonstrate its superiority in terms of practicality, generality, and efficiency. Notably, this service saves an average of 57.00% memory cost and 34.93% CPU cost on 25K in-production tasks within 20 iterations, respectively.
分布式数据分析系统——Spark是处理大量异质数据的常见选择,但调整其参数以获得高性能的挑战性很大。最近的研究表明,试图使用自动优化技术解决这一问题,但面临三个问题:有限的功能、高成本和无效的搜索。在本文中,我们提出了一个通用且高效的Spark调试框架,可以同时处理这三个问题。首先,我们介绍了一个通用的调试 formulation,它可以方便地支持多个调试目标和限制,并解决这个通用优化问题。其次,为了避免在现有方法中额外的离线评估的高成本,我们提议与每个任务的实际工作周期(即在线评估)一起调整参数。为了保证在线任务执行的安全性,我们设计了一个安全的配置获取方法,模型了安全区域。最后,利用三个创新技术进一步加速搜索过程:自适应子空间生成、近似梯度下降和元学习方法。我们将该框架作为一个独立的云计算服务实施,并将其应用于腾讯数据平台。在公开基准和大规模生产任务方面的实际结果表明,该服务在实用性、一般性和效率方面表现优异。值得注意的是,该服务在20次迭代内平均节省内存成本57.00%和CPU成本34.93%。
https://arxiv.org/abs/2309.01901
When using Stochastic Gradient Descent (SGD) for training machine learning models, it is often crucial to provide the model with examples sampled at random from the dataset. However, for large datasets stored in the cloud, random access to individual examples is often costly and inefficient. A recent work \cite{corgi}, proposed an online shuffling algorithm called CorgiPile, which greatly improves efficiency of data access, at the cost some performance loss, which is particularly apparent for large datasets stored in homogeneous shards (e.g., video datasets). In this paper, we introduce a novel two-step partial data shuffling strategy for SGD which combines an offline iteration of the CorgiPile method with a subsequent online iteration. Our approach enjoys the best of both worlds: it performs similarly to SGD with random access (even for homogenous data) without compromising the data access efficiency of CorgiPile. We provide a comprehensive theoretical analysis of the convergence properties of our method and demonstrate its practical advantages through experimental results.
当使用随机梯度下降(SGD)训练机器学习模型时,通常重要的是提供模型从数据集中随机采样的例子。然而,对于存储在云端的大数据集,访问个别例子的随机访问通常非常昂贵且效率低下。最近的一项工作 \cite{corgi} 提出了一个在线重排算法名为 CorgiPile,该算法极大地提高了数据访问的效率,但牺牲了一些性能损失,特别是对于存储在相似子集内的大数据集(例如视频数据集)。在本文中,我们介绍了一种新颖的二步partial数据在线重排策略,将 CorgiPile方法的 offline 迭代与随后的在线迭代相结合。我们的方法享受了最好的两个世界:它的表现与随机访问的 SGD 类似(即使是相似数据)而不会牺牲 CorgiPile 的数据访问效率。我们提供了全面的理论分析,我们方法的收敛性质,并通过实验结果展示了其实际优势。
https://arxiv.org/abs/2309.01640