Deformable objects present several challenges to the field of robotic manipulation. One of the tasks that best encapsulates the difficulties arising due to non-rigid behavior is shape control, which requires driving an object to a desired shape. While shape-servoing methods have been shown successful in contexts with approximately linear behavior, they can fail in tasks with more complex dynamics. We investigate an alternative approach, using offline RL to solve a planar shape control problem of a Deformable Linear Object (DLO). To evaluate the effect of material properties, two DLOs are tested namely a soft rope and an elastic cord. We frame this task as a goal-conditioned offline RL problem, and aim to learn to generalize to unseen goal shapes. Data collection and augmentation procedures are proposed to limit the amount of experimental data which needs to be collected with the real robot. We evaluate the amount of augmentation needed to achieve the best results, and test the effect of regularization through behavior cloning on the TD3+BC algorithm. Finally, we show that the proposed approach is able to outperform a shape-servoing baseline in a curvature inversion experiment.
变形对象对机器人操作领域提出了几个挑战。其中最好地描述由于非刚性行为产生的困难的是形状控制,它需要将物体驱动到所需的形状。虽然形状伺服方法在具有近线性行为的上下文中取得了成功,但在更复杂动态任务中它们可能会失败。我们研究了一种 alternative 方法,使用离线强化学习解决 Deformable Linear Object (DLO) 的平面形状控制问题。为了评估材料的属性,我们测试了两个 DLOs,即柔软的绳和弹性的绳。我们将这个任务视为一个目标条件下的离线强化学习问题,并试图学习泛化到未见过的目标形状。数据收集和增强程序提出了一种限制需要收集的实验数据量的方法。我们评估了需要增强多少才能获得最佳结果,并通过行为克隆测试了 TD3+BC 算法中的正则化效果。最后,我们证明了所提出的方法在曲率倒置实验中能够优于形状伺服基线。
https://arxiv.org/abs/2403.10290
While fine-tuning is a de facto standard method for training deep neural networks, it still suffers from overfitting when using small target datasets. Previous methods improve fine-tuning performance by maintaining knowledge of the source datasets or introducing regularization terms such as contrastive loss. However, these methods require auxiliary source information (e.g., source labels or datasets) or heavy additional computations. In this paper, we propose a simple method called adaptive random feature regularization (AdaRand). AdaRand helps the feature extractors of training models to adaptively change the distribution of feature vectors for downstream classification tasks without auxiliary source information and with reasonable computation costs. To this end, AdaRand minimizes the gap between feature vectors and random reference vectors that are sampled from class conditional Gaussian distributions. Furthermore, AdaRand dynamically updates the conditional distribution to follow the currently updated feature extractors and balance the distance between classes in feature spaces. Our experiments show that AdaRand outperforms the other fine-tuning regularization, which requires auxiliary source information and heavy computation costs.
尽管微调是一种事实上的标准训练深度神经网络的方法,但在使用小目标数据集时,它仍然容易出现过拟合。之前的方法通过保留源数据集的知识或引入类似于对比损失的规范化项来提高微调性能。然而,这些方法需要辅助源信息(例如源标签或数据集)或大量的额外计算。在本文中,我们提出了一个简单的名为自适应随机特征正则化(AdaRand)的方法。AdaRand有助于训练模型的特征提取器在不依赖辅助源信息的情况下,适应性地改变下游分类任务的特征向量分布。为此,AdaRand最小化从类条件高斯分布中采样到的特征向量与随机参考向量之间的差距。此外,AdaRand动态地更新条件分布,以跟随当前更新的特征提取器,并在特征空间中平衡类之间的距离。我们的实验结果表明,AdaRand在其他微调正则化方法中表现优异,这些方法需要辅助源信息和大量的计算成本。
https://arxiv.org/abs/2403.10097
In the wake of the global spread of monkeypox, accurate disease recognition has become crucial. This study introduces an improved SE-InceptionV3 model, embedding the SENet module and incorporating L2 regularization into the InceptionV3 framework to enhance monkeypox disease detection. Utilizing the Kaggle monkeypox dataset, which includes images of monkeypox and similar skin conditions, our model demonstrates a noteworthy accuracy of 96.71% on the test set, outperforming conventional methods and deep learning models. The SENet modules channel attention mechanism significantly elevates feature representation, while L2 regularization ensures robust generalization. Extensive experiments validate the models superiority in precision, recall, and F1 score, highlighting its effectiveness in differentiating monkeypox lesions in diverse and complex cases. The study not only provides insights into the application of advanced CNN architectures in medical diagnostics but also opens avenues for further research in model optimization and hyperparameter tuning for enhanced disease recognition. this https URL
在全球猴痘传播的背景下,准确的疾病识别变得至关重要。这项研究引入了一个改进的SE-InceptionV3模型,包括嵌入SENet模块和将L2正则化集成到InceptionV3框架中,以提高猴痘疾病检测的精度。利用Kaggle猴痘数据集,该数据集包括猴痘和其他皮肤病的图像,我们的模型在测试集上的准确率为96.71%,超越了传统方法和深度学习模型。SENet模块的通道注意力机制显著提高了特征表示,而L2正则化确保了鲁棒的泛化能力。大量的实验证实了该模型的精确度、召回率和F1分数的优越性,突出了其在不同复杂情况下区分猴痘病变的有效性。这项研究不仅为医疗诊断中高级CNN架构的应用提供了见解,而且为进一步研究模型优化和超参数调整以提高疾病识别打开了道路。您可以通过以下链接查看该研究:https://www.kaggle.com/intel-health/monkeypox-detection
https://arxiv.org/abs/2403.10087
Dataset distillation (DD) allows datasets to be distilled to fractions of their original size while preserving the rich distributional information so that models trained on the distilled datasets can achieve a comparable accuracy while saving significant computational loads. Recent research in this area has been focusing on improving the accuracy of models trained on distilled datasets. In this paper, we aim to explore a new perspective of DD. We study how to embed adversarial robustness in distilled datasets, so that models trained on these datasets maintain the high accuracy and meanwhile acquire better adversarial robustness. We propose a new method that achieves this goal by incorporating curvature regularization into the distillation process with much less computational overhead than standard adversarial training. Extensive empirical experiments suggest that our method not only outperforms standard adversarial training on both accuracy and robustness with less computation overhead but is also capable of generating robust distilled datasets that can withstand various adversarial attacks.
数据集萃取(DD)允许数据集以原始大小的分数进行萃取,同时保留丰富的分布信息,使得在萃取数据集上训练的模型可以在保留显著计算负载的同时实现与原始数据集相似的准确度。 近年来,该领域的研究一直在关注训练在萃取数据集上的模型的准确性。 在本文中,我们旨在探讨DD的新视角。我们研究了如何将对抗鲁棒性嵌入到萃取数据集中,使得训练在这些数据集上的模型具有高准确度和更好的对抗鲁棒性。我们提出了一种通过在萃取过程中引入凸性正则化来实现这一目标的新方法,该方法具有比标准对抗训练更少的计算开销,但能显著提高准确度和对抗鲁棒性。 大量的实证实验表明,我们的方法不仅在计算开销较小的情况下超越了标准对抗训练,而且能够生成具有各种对抗攻击能力的稳健萃取数据集。
https://arxiv.org/abs/2403.10045
In image-guided liver surgery, 3D-3D non-rigid registration methods play a crucial role in estimating the mapping between the preoperative model and the intraoperative surface represented as point clouds, addressing the challenge of tissue deformation. Typically, these methods incorporate a biomechanical model, represented as a finite element model (FEM), used to regularize a surface matching term. This paper introduces a novel 3D-3D non-rigid registration method. In contrast to the preceding techniques, our method uniquely incorporates the FEM within the surface matching term itself, ensuring that the estimated deformation maintains geometric consistency throughout the registration process. Additionally, we eliminate the need to determine zero-boundary conditions and applied force locations in the FEM. We achieve this by integrating soft springs into the stiffness matrix and allowing forces to be distributed across the entire liver surface. To further improve robustness, we introduce a regularization technique focused on the gradient of the force magnitudes. This regularization imposes spatial smoothness and helps prevent the overfitting of irregular noise in intraoperative data. Optimization is achieved through an accelerated proximal gradient algorithm, further enhanced by our proposed method for determining the optimal step size. Our method is evaluated and compared to both a learning-based method and a traditional method that features FEM regularization using data collected on our custom-developed phantom, as well as two publicly available datasets. Our method consistently outperforms or is comparable to the baseline techniques. Both the code and dataset will be made publicly available.
在图像引导的肝手术中,3D-3D非刚性配准方法在估计术前模型和术中表面之间的映射方面发挥了关键作用,解决了组织变形带来的挑战。通常,这些方法包括一个生物力学模型,表示为有限元模型(FEM),用于约束表面匹配项。本文介绍了一种新颖的3D-3D非刚性配准方法。与前述技术相比,我们的方法独特地将FEM融入表面匹配项中,确保估计的变形在配准过程中保持几何一致性。此外,我们消除了FEM中确定零边界条件和施加力位置的需求。我们通过将软弹簧集成到刚度矩阵中,并允许力量在肝脏表面整个范围内分布,实现了这一目标。为了进一步提高稳健性,我们引入了一种关注力的大小梯度的正则化技术。这一正则化强制保持空间平滑性,有助于防止在术中数据中的不规则噪声过拟合。通过采用加速近前梯度算法进一步增强,我们的方法还通过确定最优步长来优化。我们的方法评估并与基于学习的方法以及使用我们自行开发的标本的公开可用数据集进行了比较。我们的方法 consistently优于或与基线技术相当。代码和数据集将公开发布。
https://arxiv.org/abs/2403.09964
Multi-label imbalanced classification poses a significant challenge in machine learning, particularly evident in bioacoustics where animal sounds often co-occur, and certain sounds are much less frequent than others. This paper focuses on the specific case of classifying anuran species sounds using the dataset AnuraSet, that contains both class imbalance and multi-label examples. To address these challenges, we introduce Mixture of Mixups (Mix2), a framework that leverages mixing regularization methods Mixup, Manifold Mixup, and MultiMix. Experimental results show that these methods, individually, may lead to suboptimal results; however, when applied randomly, with one selected at each training iteration, they prove effective in addressing the mentioned challenges, particularly for rare classes with few occurrences. Further analysis reveals that Mix2 is also proficient in classifying sounds across various levels of class co-occurrences.
多标签不平衡分类在机器学习领域面临着巨大的挑战,尤其是在生物声学中,动物声音经常同时出现,某些声音比其他声音要少得多。本文重点关注使用AnuraSet数据集对节肢动物声音进行分类的特定情况,该数据集包含分类不平衡和多标签示例。为解决这些挑战,我们引入了Mixture of Mixups (Mix2)框架,该框架利用了Mixup、Manifold Mixup和MultiMix等混合正则化方法。实验结果表明,这些方法单独应用可能导致次优结果;然而,当随机应用时,在每次训练迭代中选择一个,它们在解决提到的挑战(特别是很少发生分类的稀有类别)方面表现有效。进一步的分析表明,Mix2在分类各种级别分类共存的声音方面也很擅长。
https://arxiv.org/abs/2403.09598
Mitigating social biases typically requires identifying the social groups associated with each data sample. In this paper, we present DAFair, a novel approach to address social bias in language models. Unlike traditional methods that rely on explicit demographic labels, our approach does not require any such information. Instead, we leverage predefined prototypical demographic texts and incorporate a regularization term during the fine-tuning process to mitigate bias in the model's representations. Our empirical results across two tasks and two models demonstrate the effectiveness of our method compared to previous approaches that do not rely on labeled data. Moreover, with limited demographic-annotated data, our approach outperforms common debiasing approaches.
缓解社会偏见通常需要确定每个数据样本所属的社会群体。在本文中,我们提出了一种名为DAFair的新方法来解决语言模型中的社会偏见。与传统方法依赖明确的 demographic 标签不同,我们的方法不需要这样的信息。相反,我们利用预定义的典型 demographic 文本,在微调过程中引入正则化项,以减轻模型表示中的偏见。我们通过两个任务和两个模型进行实证研究,比较了我们的方法与之前不依赖于标注数据的方法的有效性。此外,在有限的数据集下,我们的方法比常见的去偏方法表现更好。
https://arxiv.org/abs/2403.09516
Medical data often exhibits distribution shifts, which cause test-time performance degradation for deep learning models trained using standard supervised learning pipelines. This challenge is addressed in the field of Domain Generalization (DG) with the sub-field of Single Domain Generalization (SDG) being specifically interesting due to the privacy- or logistics-related issues often associated with medical data. Existing disentanglement-based SDG methods heavily rely on structural information embedded in segmentation masks, however classification labels do not provide such dense information. This work introduces a novel SDG method aimed at medical image classification that leverages channel-wise contrastive disentanglement. It is further enhanced with reconstruction-based style regularization to ensure extraction of distinct style and structure feature representations. We evaluate our method on the complex task of multicenter histopathology image classification, comparing it against state-of-the-art (SOTA) SDG baselines. Results demonstrate that our method surpasses the SOTA by a margin of 1% in average accuracy while also showing more stable performance. This study highlights the importance and challenges of exploring SDG frameworks in the context of the classification task. The code is publicly available at this https URL
医疗数据通常表现出分布变化,这导致使用标准监督学习途径训练的深度学习模型在测试时间内性能下降。这个问题在领域通用(DG)领域通过域泛化(SDG)子领域以及由于与医疗数据相关的隐私或物流问题而备受关注的单域泛化(SDG)得到了解决。现有的基于分离的SDG方法很大程度上依赖于分割掩码中嵌入的结构性信息,然而分类标签并不提供这样的丰富信息。本文介绍了一种新的SDG方法,旨在解决医疗图像分类问题,该方法利用通道级的对比性分离。它通过基于重构的样式正则化进一步增强了提取独特样式和结构特征表示。我们在多中心组织病理学图像分类的复杂任务上评估我们的方法,并将其与最先进的SDG基线进行比较。结果表明,与最先进的SDG基线相比,我们的方法在平均准确度上提高了1%的领先优势,同时表现出更稳定的性能。本研究突出了在分类任务背景下探索SDG框架的重要性以及所面临的挑战。代码公开在https:// this URL
https://arxiv.org/abs/2403.09400
Sparse-view Computed Tomography (CT) image reconstruction is a promising approach to reduce radiation exposure, but it inevitably leads to image degradation. Although diffusion model-based approaches are computationally expensive and suffer from the training-sampling discrepancy, they provide a potential solution to the problem. This study introduces a novel Cascaded Diffusion with Discrepancy Mitigation (CDDM) framework, including the low-quality image generation in latent space and the high-quality image generation in pixel space which contains data consistency and discrepancy mitigation in a one-step reconstruction process. The cascaded framework minimizes computational costs by moving some inference steps from pixel space to latent space. The discrepancy mitigation technique addresses the training-sampling gap induced by data consistency, ensuring the data distribution is close to the original manifold. A specialized Alternating Direction Method of Multipliers (ADMM) is employed to process image gradients in separate directions, offering a more targeted approach to regularization. Experimental results across two datasets demonstrate CDDM's superior performance in high-quality image generation with clearer boundaries compared to existing methods, highlighting the framework's computational efficiency.
稀疏视图计算断层成像(CT)图像重建是一种减少辐射暴露的有前景的方法,但它不可避免地会导致图像质量下降。尽管扩散模型方法在计算上具有很高的成本,并受到训练-采样差异的影响,但它们提供了解决问题的潜在解决方案。本研究引入了一种名为级联扩散与差异缓解(CDDM)的新框架,包括在潜在空间中生成低质量图像和高质量像素空间中生成高质量图像,这些图像包含数据一致性和差异缓解在一步重建过程中。级联框架通过将一些推理步骤从像素空间移动到潜在空间,最小化计算成本。差异缓解技术通过解决数据一致性引起的数据采样差异,确保数据分布接近原始流形。采用特殊的不变方向乘子法(ADMM)处理图像梯度在各自方向上,提供了一种更针对正则化的方法。在两个数据集上的实验结果表明,与现有方法相比,CDDM在高质量图像生成方面具有更清晰的边界,突显了框架的计算效率。
https://arxiv.org/abs/2403.09355
We present HeadEvolver, a novel framework to generate stylized head avatars from text guidance. HeadEvolver uses locally learnable mesh deformation from a template head mesh, producing high-quality digital assets for detail-preserving editing and animation. To tackle the challenges of lacking fine-grained and semantic-aware local shape control in global deformation through Jacobians, we introduce a trainable parameter as a weighting factor for the Jacobian at each triangle to adaptively change local shapes while maintaining global correspondences and facial features. Moreover, to ensure the coherence of the resulting shape and appearance from different viewpoints, we use pretrained image diffusion models for differentiable rendering with regularization terms to refine the deformation under text guidance. Extensive experiments demonstrate that our method can generate diverse head avatars with an articulated mesh that can be edited seamlessly in 3D graphics software, facilitating downstream applications such as more efficient animation with inherited blend shapes and semantic consistency.
我们提出了HeadEvolver,一种从文本指导生成拟人化头模型的新颖框架。HeadEvolver利用从模板头网格中局部学习可变网格变形,为保留细节的编辑和动画生成高质量数字资产。为了解决由于Jacians缺乏细粒度和语义感知局部形状控制而导致的全局变形挑战,我们引入了一个可训练的参数作为一个加权因子,在三角剖面中适应性地改变局部形状,同时保持全局对应关系和面部特征的准确性。此外,为了确保生成的形状和外观在不同的视角下具有连贯性,我们在不同viewpoint下使用了预训练的图像扩散模型进行有规律的渲染,附加有约束力的项以平滑在文本指导下的变形。大量实验证明,我们的方法可以生成具有关节状网格的多样头模型,可以无缝编辑在3D图形软件中,从而促进下游应用,如通过继承 blend shapes更高效的动画和语义一致性。
https://arxiv.org/abs/2403.09326
We study gradient flow on the exponential loss for a classification problem with a one-layer softmax attention model, where the key and query weight matrices are trained separately. Under a separability assumption on the data, we show that when gradient flow achieves the minimal loss value, it further implicitly minimizes the nuclear norm of the product of the key and query weight matrices. Such implicit regularization can be described by a Support Vector Machine (SVM) problem with respect to the attention weights. This finding contrasts with prior results showing that the gradient descent induces an implicit regularization on the Frobenius norm on the product weight matrix when the key and query matrices are combined into a single weight matrix for training. For diagonal key and query matrices, our analysis builds upon the reparameterization technique and exploits approximate KKT conditions of the SVM associated with the classification data. Moreover, the results are extended to general weights configurations given proper alignment of the weight matrices' singular spaces with the data features at initialization.
我们研究的是在具有单层软max注意力的分类问题中,梯度在指数损失上的传播。在这种假设数据上,我们证明了当梯度达到最小损失值时,它进一步隐含地最小化了键和查询权重矩阵的乘积核范数。这种隐式正则化可以描述为与注意力权重相关的支持向量机(SVM)问题。这一发现与之前的结果相反,后者表明在将键和查询矩阵组合成一个权重矩阵进行训练时,梯度下降会在乘积权重矩阵上诱导隐式正则化。对于对称的键和查询矩阵,我们的分析基于同余变换技术和与分类数据相关的SVM的近KKT条件。此外,结果还扩展到给定合适的数据特征与初始化时权重矩阵的向量空间对齐的情况。
https://arxiv.org/abs/2403.08699
Federated learning (FL) empowers privacy-preservation in model training by only exposing users' model gradients. Yet, FL users are susceptible to the gradient inversion (GI) attack which can reconstruct ground-truth training data such as images based on model gradients. However, reconstructing high-resolution images by existing GI attack works faces two challenges: inferior accuracy and slow-convergence, especially when the context is complicated, e.g., the training batch size is much greater than 1 on each FL user. To address these challenges, we present a Robust, Accurate and Fast-convergent GI attack algorithm, called RAF-GI, with two components: 1) Additional Convolution Block (ACB) which can restore labels with up to 20% improvement compared with existing works; 2) Total variance, three-channel mEan and cAnny edge detection regularization term (TEA), which is a white-box attack strategy to reconstruct images based on labels inferred by ACB. Moreover, RAF-GI is robust that can still accurately reconstruct ground-truth data when the users' training batch size is no more than 48. Our experimental results manifest that RAF-GI can diminish 94% time costs while achieving superb inversion quality in ImageNet dataset. Notably, with a batch size of 1, RAF-GI exhibits a 7.89 higher Peak Signal-to-Noise Ratio (PSNR) compared to the state-of-the-art baselines.
联邦学习(FL)通过仅暴露用户的模型梯度来实现模型的隐私保护。然而,FL用户易受到梯度反向(GI)攻击的攻击,该攻击可以根据模型梯度重构训练数据,如图像。然而,通过现有的GI攻击重构高分辨率图像面临着两个挑战:准确性和收敛速度,尤其是在复杂背景下,例如每个FL用户的训练批量大小远大于1。为了应对这些挑战,我们提出了一个鲁棒、准确且收敛速度快的GI攻击算法,称为RAF-GI,包含两个组件:1)附加卷积层(ACB),它可以比现有工作最多提高20%的标签恢复;2)总方差,三个通道的mEan和cCanny边缘检测正则化项(TEA),这是一种白盒攻击策略,用于根据ACB推断的标签重构图像。此外,RAF-GI具有鲁棒性,即使在用户训练批量大小不超过48时,仍能准确地重构地面真实数据。我们的实验结果表明,RAF-GI可以在ImageNet数据集上减少94%的时间开销,同时具有出色的逆向质量。值得注意的是,当批量为1时,RAF-GI显示出比最先进的基准模型高出7.89倍的峰值信号-噪声比(PSNR)。
https://arxiv.org/abs/2403.08383
Data augmentation is arguably the most important regularization technique commonly used to improve generalization performance of machine learning models. It primarily involves the application of appropriate data transformation operations to create new data samples with desired properties. Despite its effectiveness, the process is often challenging because of the time-consuming trial and error procedures for creating and testing different candidate augmentations and their hyperparameters manually. Automated data augmentation methods aim to automate the process. State-of-the-art approaches typically rely on automated machine learning (AutoML) principles. This work presents a comprehensive survey of AutoML-based data augmentation techniques. We discuss various approaches for accomplishing data augmentation with AutoML, including data manipulation, data integration and data synthesis techniques. We present extensive discussion of techniques for realizing each of the major subtasks of the data augmentation process: search space design, hyperparameter optimization and model evaluation. Finally, we carried out an extensive comparison and analysis of the performance of automated data augmentation techniques and state-of-the-art methods based on classical augmentation approaches. The results show that AutoML methods for data augmentation currently outperform state-of-the-art techniques based on conventional approaches.
数据增强是机器学习模型改进通用性能的最常用的正则化技术。它主要涉及应用适当的数据变换操作来创建具有所需属性的新数据样本。尽管它的效果很好,但过程通常很难,因为创建和测试不同候选增强及其超参数需要花费大量的时间和精力。自动数据增强方法旨在自动化这个过程。最先进的 approaches 通常依赖自适应机器学习(AutoML)原则。 本文对基于 AutoML 的数据增强技术进行了全面的调查。我们讨论了使用数据操作、数据集成和数据合成技术实现数据增强的方法。我们详细讨论了实现数据增强过程中每个主要子任务的技能:搜索空间设计、超参数优化和模型评估。最后,我们对基于经典增强方法的现有技术和最先进的方法进行了广泛的比较和分析。结果表明,基于 AutoML 的数据增强方法目前超越了传统方法的性能。
https://arxiv.org/abs/2403.08352
Ring artifacts in computed tomography images, arising from the undesirable responses of detector units, significantly degrade image quality and diagnostic reliability. To address this challenge, we propose a dual-domain regularization model to effectively remove ring artifacts, while maintaining the integrity of the original CT image. The proposed model corrects the vertical stripe artifacts on the sinogram by innovatively updating the response inconsistency compensation coefficients of detector units, which is achieved by employing the group sparse constraint and the projection-view direction sparse constraint on the stripe artifacts. Simultaneously, we apply the sparse constraint on the reconstructed image to further rectified ring artifacts in the image domain. The key advantage of the proposed method lies in considering the relationship between the response inconsistency compensation coefficients of the detector units and the projection views, which enables a more accurate correction of the response of the detector units. An alternating minimization method is designed to solve the model. Comparative experiments on real photon counting detector data demonstrate that the proposed method not only surpasses existing methods in removing ring artifacts but also excels in preserving structural details and image fidelity.
计算断层图像中的环形伪影会显著降低图像质量和诊断可靠性。为解决这个问题,我们提出了一个双域正则化模型,有效地去除环形伪影,同时保持原始CT图像的完整性。该模型通过创新地更新检测器的响应不一致性补偿系数来纠正扫描图像上的垂直条带伪影。同时,我们在重构图像上应用稀疏约束,进一步修复图像域中的环形伪影。所提出方法的关键优势在于考虑了检测器单元的响应不一致性补偿系数和投影view之间的关系,这使得对检测单元响应的更准确校正成为可能。基于交替最小二乘法的模型设计旨在求解该模型。对真实光子计数探测器数据的比较实验证明,与现有方法相比,所提出的方法在去除环形伪影方面不仅超过了现有方法,而且在保留结构细节和图像保真度方面表现出色。
https://arxiv.org/abs/2403.08247
In this work we focus on learning facial representations that can be adapted to train effective face recognition models, particularly in the absence of labels. Firstly, compared with existing labelled face datasets, a vastly larger magnitude of unlabeled faces exists in the real world. We explore the learning strategy of these unlabeled facial images through self-supervised pretraining to transfer generalized face recognition performance. Moreover, motivated by one recent finding, that is, the face saliency area is critical for face recognition, in contrast to utilizing random cropped blocks of images for constructing augmentations in pretraining, we utilize patches localized by extracted facial landmarks. This enables our method - namely LAndmark-based Facial Self-supervised learning LAFS), to learn key representation that is more critical for face recognition. We also incorporate two landmark-specific augmentations which introduce more diversity of landmark information to further regularize the learning. With learned landmark-based facial representations, we further adapt the representation for face recognition with regularization mitigating variations in landmark positions. Our method achieves significant improvement over the state-of-the-art on multiple face recognition benchmarks, especially on more challenging few-shot scenarios.
在这项工作中,我们专注于学习可以在训练有效面部识别模型时进行自适应的面部表示。特别是在没有标签的情况下。首先,与现有的带有标签的人脸数据集相比,现实世界中存在大量未标记的脸。我们通过自监督预训练来探索这些未标记面部图像的学习策略,以转移通用的面部识别性能。此外,受到最近的一个发现的影响,即脸部显著区域对于面部识别至关重要,我们使用通过提取面部特征点来定位的补丁作为自监督学习中的标签。这使得我们的方法 - 基于LAndmark的特征点自监督学习LAFS) - 可以学习更具关键性的面部表示。我们还引入了两种特定特征点的自监督增强,以进一步规范化学习。通过学习基于特征点的面部表示,我们进一步通过正则化来缓解特征点位置的变化。我们的方法在多个面部识别基准测试中都实现了显著的改进,尤其是在更具挑战性的几拍场景中。
https://arxiv.org/abs/2403.08161
Facial attribute editing using generative models can impair automated face recognition. This degradation persists even with recent identity-preserving models such as InstantID. To mitigate this issue, we propose two techniques that perform local and global attribute editing. Local editing operates on the finer details via a regularization-free method based on ControlNet conditioned on depth maps and auxiliary semantic segmentation masks. Global editing operates on coarser details via a regularization-based method guided by custom loss and regularization set. In this work, we empirically ablate twenty-six facial semantic, demographic and expression-based attributes altered using state-of-the-art generative models and evaluate them using ArcFace and AdaFace matchers on CelebA, CelebAMaskHQ and LFW datasets. Finally, we use LLaVA, a vision-language framework for attribute prediction to validate our editing techniques. Our methods outperform SoTA (BLIP, InstantID) at facial editing while retaining identity.
使用生成模型进行面部属性编辑可能会损害自动人脸识别的准确性。即使使用最新的自证身份保留模型(如InstantID)也会出现这种降级。为了减轻这个问题,我们提出了两种方法,它们都进行局部和全局属性编辑。局部编辑通过基于深度图的条件自由度为ControlNet的规范化方法来操作更细的细节。全局编辑通过基于自定义损失和正则化设置的规范化方法来操作较粗的细节。在本文中,我们通过实验对使用最先进的生成模型(如ArcFace和AdaFace)对26个面部语义、人口统计和表情属性进行编辑,并使用CelebA、CelebAMaskHQ和LFW数据集评估它们。最后,我们使用LLaVa(用于属性预测的视觉语言框架)验证我们的编辑技术。我们的方法在保留身份的同时优于SoTA(BLIP和InstantID)在面部编辑方面。
https://arxiv.org/abs/2403.08092
An important and difficult task in code-switched speech recognition is to recognize the language, as lots of words in two languages can sound similar, especially in some accents. We focus on improving performance of end-to-end Automatic Speech Recognition models by conditioning transformer layers on language ID of words and character in the output in an per layer supervised manner. To this end, we propose two methods of introducing language specific parameters and explainability in the multi-head attention mechanism, and implement a Temporal Loss that helps maintain continuity in input alignment. Despite being unable to reduce WER significantly, our method shows promise in predicting the correct language from just spoken data. We introduce regularization in the language prediction by dropping LID in the sequence, which helps align long repeated output sequences.
在代码切换语音识别中,一个重要而困难的任务是识别语言,因为两种语言中有很多单词听起来相似,特别是在某些口音中。我们专注于通过在每层监督的方式条件下训练变换器层来提高端到端自动语音识别模型的性能。为此,我们提出了两种在多头注意机制中引入语言特定参数和解释性的方法,并实现了一个时间损失,该损失有助于保持输入对齐的连续性。尽管我们无法显著减少WER,但我们的方法在仅基于口语数据预测正确语言方面显示出了潜力。我们在语言预测中引入正则化,通过删除序列中的LID来帮助对齐长重复输出序列。
https://arxiv.org/abs/2403.08011
Few-Shot Class-Incremental Learning (FSCIL) enables machine learning systems to expand their inference capabilities to new classes using only a few labeled examples, without forgetting the previously learned classes. Classical backpropagation-based learning and its variants are often unsuitable for battery-powered, memory-constrained systems at the extreme edge. In this work, we introduce Online Few-Shot Class-Incremental Learning (O-FSCIL), based on a lightweight model consisting of a pretrained and metalearned feature extractor and an expandable explicit memory storing the class prototypes. The architecture is pretrained with a novel feature orthogonality regularization and metalearned with a multi-margin loss. For learning a new class, our approach extends the explicit memory with novel class prototypes, while the remaining architecture is kept frozen. This allows learning previously unseen classes based on only a few examples with one single pass (hence online). O-FSCIL obtains an average accuracy of 68.62% on the FSCIL CIFAR100 benchmark, achieving state-of-the-art results. Tailored for ultra-low-power platforms, we implement O-FSCIL on the 60 mW GAP9 microcontroller, demonstrating online learning capabilities within just 12 mJ per new class.
少样本分类增强学习(FSCIL)使机器学习系统能够使用仅几只标记样本来将推理能力扩展到新的类别,而不会忘记之前学习的类别。经典的反向传播学习和其变体通常不适用于极端的电池驱动和内存受限系统。在这项工作中,我们引入了基于轻量级模型的在线少样本分类增强学习(O-FSCIL),该模型由预训练和金属特征提取器和一个可扩展的显存组成,用于存储类别原型。架构使用了一种新颖的特征正交性正则化和多边损失金属学习。对于学习一个新的类别,我们的方法通过新颖的类别原型扩展显存,而其余架构保持不变。这允许通过一次仅使用几个示例来学习之前未见过的类别(因此是在线的)。O-FSCIL在FSCIL CIFAR100基准测试中的平均准确率达到了68.62%,实现了最先进的结果。针对超低功耗平台,我们在60 mW GAP9微控制器上实现了O-FSCIL,展示了仅12 mJ每新类别的在线学习能力。
https://arxiv.org/abs/2403.07851
When training deep neural networks, the phenomenon of $\textit{dying neurons}$ $\unicode{x2013}$units that become inactive or saturated, output zero during training$\unicode{x2013}$ has traditionally been viewed as undesirable, linked with optimization challenges, and contributing to plasticity loss in continual learning scenarios. In this paper, we reassess this phenomenon, focusing on sparsity and pruning. By systematically exploring the impact of various hyperparameter configurations on dying neurons, we unveil their potential to facilitate simple yet effective structured pruning algorithms. We introduce $\textit{Demon Pruning}$ (DemP), a method that controls the proliferation of dead neurons, dynamically leading to network sparsity. Achieved through a combination of noise injection on active units and a one-cycled schedule regularization strategy, DemP stands out for its simplicity and broad applicability. Experiments on CIFAR10 and ImageNet datasets demonstrate that DemP surpasses existing structured pruning techniques, showcasing superior accuracy-sparsity tradeoffs and training speedups. These findings suggest a novel perspective on dying neurons as a valuable resource for efficient model compression and optimization.
在训练深度神经网络时,传统上将死亡神经元($\textit{dying neurons}$)在训练过程中失效或饱和,输出为零的现象视为不利的,与优化挑战有关,并导致连续学习场景中的可塑性损失。在本文中,我们重新评估了这一现象,重点关注稀疏性和剪枝。通过系统地探索各种超参数配置对死亡神经元的影响,我们揭示了它们在促进简单而有效的结构剪枝算法方面的潜力。我们引入了$\textit{Demon Pruning}$( DemP)方法,一种控制死亡神经元繁殖的方法,动态地导致网络稀疏。通过噪声注入活动单元和周期性调度正则化策略的组合实现,DemP以其简单性和广泛的适用性脱颖而出。在CIFAR10和ImageNet数据集上的实验证明,DemP超越了现有的结构剪枝技术,展示了卓越的准确性和稀疏性权衡以及训练速度提升。这些发现表明,死亡神经元作为一种有价值的资源,可以用于高效的模型压缩和优化。
https://arxiv.org/abs/2403.07688
We propose a fresh take on understanding the mechanisms of neural networks by analyzing the rich structure of parameters contained within their optimization trajectories. Towards this end, we introduce some natural notions of the complexity of optimization trajectories, both qualitative and quantitative, which reveal the inherent nuance and interplay involved between various optimization choices, such as momentum, weight decay, and batch size. We use them to provide key hallmarks about the nature of optimization in deep neural networks: when it goes right, and when it finds itself in a dead end. Further, thanks to our trajectory perspective, we uncover an intertwined behaviour of momentum and weight decay that promotes directional exploration, as well as a directional regularization behaviour of some others. We perform experiments over large-scale vision and language settings, including large language models (LLMs) with up to 12 billion parameters, to demonstrate the value of our approach.
我们提出了一个新颖的方法来理解神经网络的优化机制,通过分析其优化路径中包含的参数的丰富结构。为此,我们引入了一些关于优化轨迹复杂性的自然概念,包括定性和定量方面,揭示了各种优化选择之间的固有细微差别和相互作用,例如动量、权重衰减和批量大小的优化。利用它们,我们提供了有关深度神经网络优化本质的关键标记:正确的时候,以及陷入死胡同的时候。此外,得益于我们的轨迹视角,我们揭示了动量和权重衰减之间的相互作用,推动了方向探索,以及某些其他的方向性 regularization 行为。我们在包括大型视觉和语言设置的大规模数据集上进行实验,包括具有多达 120 亿参数的大型语言模型(LLMs),以证明我们方法的价值。
https://arxiv.org/abs/2403.07379