As neural networks increasingly make critical decisions in high-stakes settings, monitoring and explaining their behavior in an understandable and trustworthy manner is a necessity. One commonly used type of explainer is post hoc feature attribution, a family of methods for giving each feature in an input a score corresponding to its influence on a model's output. A major limitation of this family of explainers in practice is that they can disagree on which features are more important than others. Our contribution in this paper is a method of training models with this disagreement problem in mind. We do this by introducing a Post hoc Explainer Agreement Regularization (PEAR) loss term alongside the standard term corresponding to accuracy, an additional term that measures the difference in feature attribution between a pair of explainers. We observe on three datasets that we can train a model with this loss term to improve explanation consensus on unseen data, and see improved consensus between explainers other than those used in the loss term. We examine the trade-off between improved consensus and model performance. And finally, we study the influence our method has on feature attribution explanations.
神经网络在高风险环境中越来越频繁地做出关键决策,因此理解和可信地监测和解释其行为是至关重要的。一种常见的解释器类型是后处理特征归因,一种方法 family,为给输入每个特征赋予其对模型输出的影响对应的得分而提供一组方法。在实践中,这个 family 的主要限制是它们可能不同意哪个特征比其他特征更重要。本文的贡献是考虑这个问题并使用一种后处理解释器协议 regularization (PEAR) loss Term,与精度标准 term 一起使用,并添加一个用于衡量两个解释器之间的特征归因差异的新 term。我们在三个数据集上观察,可以训练模型使用这个 loss Term 来提高 unseen 数据下的解释一致性,并观察除了用于 loss Term 的解释器之外之间的更好的一致性。我们研究提高一致性和模型性能之间的权衡。最后,我们研究我们的方法对特征归因解释的影响。
https://arxiv.org/abs/2303.13299
Domain generalization (DG) tends to alleviate the poor generalization capability of deep neural networks by learning model with multiple source domains. A classical solution to DG is domain augmentation, the common belief of which is that diversifying source domains will be conducive to the out-of-distribution generalization. However, these claims are understood intuitively, rather than mathematically. Our explorations empirically reveal that the correlation between model generalization and the diversity of domains may be not strictly positive, which limits the effectiveness of domain augmentation. This work therefore aim to guarantee and further enhance the validity of this strand. To this end, we propose a new perspective on DG that recasts it as a convex game between domains. We first encourage each diversified domain to enhance model generalization by elaborately designing a regularization term based on supermodularity. Meanwhile, a sample filter is constructed to eliminate low-quality samples, thereby avoiding the impact of potentially harmful information. Our framework presents a new avenue for the formal analysis of DG, heuristic analysis and extensive experiments demonstrate the rationality and effectiveness.
域泛化(DG)倾向于通过学习多个源域的模型来缓解深度学习网络的泛化能力不佳的问题。经典的解决方案是域扩展(Domain Augmentation),其普遍的观点认为,多样化的源域有助于非均匀泛化。然而,这些主张 intuitive,而不是数学上的理解。我们的实验 empirical 表明,模型泛化与域的多样性之间的关系可能不一定非负,这限制了域扩展的有效性。因此,本工作旨在保证和进一步增强这一方向的效力。为此,我们提出了一种新的视角,将其重构为两个域之间的凸博弈。我们首先鼓励每个多样化的域优化模型泛化,通过基于超共基的 Regularization Term 细致地设计一个正则化项。同时,我们建立了样本过滤器,以消除低质量样本,从而避免可能有害的信息的影响。我们的框架提供了 formal 分析 DG 的新途径,启发式分析和广泛的实验展示了其合理性和有效性。
https://arxiv.org/abs/2303.13297
Despite the great success in 2D editing using user-friendly tools, such as Photoshop, semantic strokes, or even text prompts, similar capabilities in 3D areas are still limited, either relying on 3D modeling skills or allowing editing within only a few this http URL this paper, we present a novel semantic-driven NeRF editing approach, which enables users to edit a neural radiance field with a single image, and faithfully delivers edited novel views with high fidelity and multi-view this http URL achieve this goal, we propose a prior-guided editing field to encode fine-grained geometric and texture editing in 3D space, and develop a series of techniques to aid the editing process, including cyclic constraints with a proxy mesh to facilitate geometric supervision, a color compositing mechanism to stabilize semantic-driven texture editing, and a feature-cluster-based regularization to preserve the irrelevant content unchanged.Extensive experiments and editing examples on both real-world and synthetic data demonstrate that our method achieves photo-realistic 3D editing using only a single edited image, pushing the bound of semantic-driven editing in 3D real-world scenes. Our project webpage: this https URL.
尽管使用易于使用的工具在2D领域取得了巨大的成功,例如 Photoshop、语义线条、甚至文本提示,但在3D领域类似的能力仍然有限,要么依赖于3D建模技能,要么只允许在几个 http://www.example.com 上编辑。本文介绍了一种全新的语义驱动 NeRF 编辑方法,让用户使用单个图像编辑神经网络光场,并忠实地呈现高质量的多视角和新视角编辑。为了实现这一目标,我们提出了一种前导引导编辑区域,在3D空间中编码精细的几何和纹理编辑,并开发了一系列技术来协助编辑过程,包括循环约束使用代理网格以促进几何监督、稳定语义驱动纹理编辑的色彩组合机制,以及基于特征簇的 regularization 以保留无关内容不变。在真实世界和合成数据上的广泛实验和编辑示例表明,我们的方法仅使用单个编辑图像就能实现照片般的3D编辑,突破了语义驱动编辑在3D真实场景下的的界限。我们的项目页面: this https://www.example.com/ URL。
https://arxiv.org/abs/2303.13277
In this paper, we investigate an open research task of generating controllable 3D textured shapes from the given textual descriptions. Previous works either require ground truth caption labeling or extensive optimization time. To resolve these issues, we present a novel framework, TAPS3D, to train a text-guided 3D shape generator with pseudo captions. Specifically, based on rendered 2D images, we retrieve relevant words from the CLIP vocabulary and construct pseudo captions using templates. Our constructed captions provide high-level semantic supervision for generated 3D shapes. Further, in order to produce fine-grained textures and increase geometry diversity, we propose to adopt low-level image regularization to enable fake-rendered images to align with the real ones. During the inference phase, our proposed model can generate 3D textured shapes from the given text without any additional optimization. We conduct extensive experiments to analyze each of our proposed components and show the efficacy of our framework in generating high-fidelity 3D textured and text-relevant shapes.
在本文中,我们探讨了一个开放的研究任务,即从给定的文字描述中生成可控制3D形状纹理。以前的工作要么需要真实的标题标签标注,要么需要进行大量的优化时间。为了解决这些难题,我们提出了一个新的框架TAPS3D,以训练一个基于伪标题的文本引导3D形状生成器。具体来说,基于渲染的2D图像,我们从CLIP词汇库中检索相关词汇,并使用模板使用模板构建伪标题。我们构建的伪标题为生成的3D形状提供了高水平的语义监督。此外,为了产生细致的纹理和提高几何多样性,我们提议采用低层次的图像 Regularization 方法,使伪渲染图像与真实图像对齐。在推理阶段,我们的提议模型可以从给定的文字中不需要任何额外的优化就能生成3D形状纹理。我们进行了广泛的实验来分析我们提出的每个组件,并展示我们框架在生成高保真的3D形状和与文本相关的形状方面的效率。
https://arxiv.org/abs/2303.13273
Recent advances in 3D scene representation and novel view synthesis have witnessed the rise of Neural Radiance Fields (NeRFs). Nevertheless, it is not trivial to exploit NeRF for the photorealistic 3D scene stylization task, which aims to generate visually consistent and photorealistic stylized scenes from novel views. Simply coupling NeRF with photorealistic style transfer (PST) will result in cross-view inconsistency and degradation of stylized view syntheses. Through a thorough analysis, we demonstrate that this non-trivial task can be simplified in a new light: When transforming the appearance representation of a pre-trained NeRF with Lipschitz mapping, the consistency and photorealism across source views will be seamlessly encoded into the syntheses. That motivates us to build a concise and flexible learning framework namely LipRF, which upgrades arbitrary 2D PST methods with Lipschitz mapping tailored for the 3D scene. Technically, LipRF first pre-trains a radiance field to reconstruct the 3D scene, and then emulates the style on each view by 2D PST as the prior to learn a Lipschitz network to stylize the pre-trained appearance. In view of that Lipschitz condition highly impacts the expressivity of the neural network, we devise an adaptive regularization to balance the reconstruction and stylization. A gradual gradient aggregation strategy is further introduced to optimize LipRF in a cost-efficient manner. We conduct extensive experiments to show the high quality and robust performance of LipRF on both photorealistic 3D stylization and object appearance editing.
最近的3D场景表示和新视角合成技术的进步见证了神经网络辐射场(NeRF)的崛起。然而,利用NeRF进行逼真3D场景风格化任务仍然是一项艰巨的任务,该任务旨在从新视角生成视觉一致性和逼真风格化的场景。仅仅将NeRF与逼真风格转移(PST)耦合会导致跨视角一致性和风格化合成的退化。通过深入分析,我们证明了这个艰巨的任务可以通过新的视角简化:当对预先训练的NeRF的外观表示进行Lipschitz映射时,源视角的一致性和逼真性将无缝编码到合成中。这激励我们建立名为LipRF的简洁且灵活的学习框架,该框架升级了任意2D PST方法,并针对3D场景进行了Lipschitz映射定制。技术上,LipRF首先预训练一个辐射场来重建3D场景,然后通过2D PST模拟每个视角,作为之前学习的一个Lipschitz网络,学习一种风格化网络以风格化预先训练的外观。鉴于Lipschitz条件高度影响神经网络表达能力,我们设计了一种自适应正则化来平衡重建和风格化。逐渐梯度聚合策略还引入了以高效优化LipRF。我们进行了广泛的实验,以展示LipRF在逼真3D风格化任务和对象外观编辑中的质量和鲁棒性能。
https://arxiv.org/abs/2303.13232
Class-Incremental Learning updates a deep classifier with new categories while maintaining the previously observed class accuracy. Regularizing the neural network weights is a common method to prevent forgetting previously learned classes while learning novel ones. However, existing regularizers use a constant magnitude throughout the learning sessions, which may not reflect the varying levels of difficulty of the tasks encountered during incremental learning. This study investigates the necessity of adaptive regularization in Class-Incremental Learning, which dynamically adjusts the regularization strength according to the complexity of the task at hand. We propose a Bayesian Optimization-based approach to automatically determine the optimal regularization magnitude for each learning task. Our experiments on two datasets via two regularizers demonstrate the importance of adaptive regularization for achieving accurate and less forgetful visual incremental learning.
类别增量学习在更新深度分类器中的新类别的同时,保持了之前观察到的分类精度。 regularizing 神经网络权重是一种常见的方法,以防止在学习新类别时忘记之前学习过的类别。然而,现有的 regularizer 在整个学习过程中使用了一个恒定的数值,这可能会不反映增量学习中遇到的任务难度的不断变化。 本研究调查了类别增量学习中自适应 regularization 的必要性,该方法根据当前任务的复杂性动态地调整 regularization 强度。我们提出了一种基于贝叶斯优化的方法,以自动确定每个学习任务的最优 regularizer 强度。我们的实验通过两个Regularizer 对两个数据集进行了展示,以证明自适应 regularization 对实现准确且较少忘记的可视化增量学习的重要性。
https://arxiv.org/abs/2303.13113
In the last few years, many works have tried to explain the predictions of deep learning models. Few methods, however, have been proposed to verify the accuracy or faithfulness of these explanations. Recently, influence functions, which is a method that approximates the effect that leave-one-out training has on the loss function, has been shown to be fragile. The proposed reason for their fragility remains unclear. Although previous work suggests the use of regularization to increase robustness, this does not hold in all cases. In this work, we seek to investigate the experiments performed in the prior work in an effort to understand the underlying mechanisms of influence function fragility. First, we verify influence functions using procedures from the literature under conditions where the convexity assumptions of influence functions are met. Then, we relax these assumptions and study the effects of non-convexity by using deeper models and more complex datasets. Here, we analyze the key metrics and procedures that are used to validate influence functions. Our results indicate that the validation procedures may cause the observed fragility.
在过去几年中,许多工作试图解释深度学习模型的预测。然而, few methods have been proposed to verify the accuracy or faithfulness of these explanations. Recently, influence functions, which is a method that approximates the effect that leave-one-out training has on the loss function, has been shown to be fragile. The proposed reason for their fragility remains unclear. Although previous work suggests the use of regularization to increase robustness, this does not hold in all cases. In this work, we seek to investigate the experiments performed in the prior work in an effort to understand the underlying mechanisms of influence function fragility. First, we verify influence functions using procedures from the literature under conditions where the convexity assumptions of influence functions are met. Then, we relax these assumptions and study the effects of non-Convexity by using deeper models and more complex datasets. Here, we analyze the key metrics and procedures that are used to validate influence functions. Our results indicate that the validation procedures may cause the observed fragility.
https://arxiv.org/abs/2303.12922
3D GAN inversion aims to achieve high reconstruction fidelity and reasonable 3D geometry simultaneously from a single image input. However, existing 3D GAN inversion methods rely on time-consuming optimization for each individual case. In this work, we introduce a novel encoder-based inversion framework based on EG3D, one of the most widely-used 3D GAN models. We leverage the inherent properties of EG3D's latent space to design a discriminator and a background depth regularization. This enables us to train a geometry-aware encoder capable of converting the input image into corresponding latent code. Additionally, we explore the feature space of EG3D and develop an adaptive refinement stage that improves the representation ability of features in EG3D to enhance the recovery of fine-grained textural details. Finally, we propose an occlusion-aware fusion operation to prevent distortion in unobserved regions. Our method achieves impressive results comparable to optimization-based methods while operating up to 500 times faster. Our framework is well-suited for applications such as semantic editing.
3D GAN 转换旨在同时从单个图像输入中实现高保真的三维几何和合理的重建。然而,现有的3D GAN转换方法依赖于每个个体情况下费时的优化。在这项工作中,我们介绍了基于EG3D(一种最常用的3D GAN模型)的新编码框架,EG3D是其中最常用的模型之一。我们利用EG3D的隐状态空间固有的性质设计了分而治之器和背景深度 Regularization。这使我们能够训练具有三维几何意识的编码器,将其输入图像转换为相应的隐编码。此外,我们探索了EG3D的特征空间并开发了自适应改进阶段,以提高EG3D中特征的表达能力,以增强细粒度纹理细节的恢复。最后,我们提出了一种有遮挡意识的融合操作,以避免未观测区域中的失真。我们的方法实现令人印象深刻的结果,与基于优化的方法相当,但运行速度高达500倍。我们的框架非常适合应用,例如语义编辑。
https://arxiv.org/abs/2303.12326
Prompt tuning is a parameter-efficient method, which learns soft prompts and conditions frozen language models to perform specific downstream tasks. Though effective, prompt tuning under few-shot settings on the one hand heavily relies on a good initialization of soft prompts. On the other hand, it can easily result in overfitting. Existing works leverage pre-training or supervised meta-learning to initialize soft prompts but they cannot data-efficiently generalize to unseen downstream tasks. To address the above problems, this paper proposes a novel Self-sUpervised meta-Prompt learning framework with meta-gradient Regularization for few-shot generalization (SUPMER). We first design a set of self-supervised anchor meta-training tasks with different task formats and further enrich the task distribution with curriculum-based task augmentation. Then a novel meta-gradient regularization method is integrated into meta-prompt learning. It meta-learns to transform the raw gradients during few-shot learning into a domain-generalizable direction, thus alleviating the problem of overfitting. Extensive experiments show that SUPMER achieves better performance for different few-shot downstream tasks, and also exhibits a stronger domain generalization ability.
Prompt Tuner是一种参数高效的方法,通过学习软提示和冻结语言模型来执行特定的后续任务。尽管这种方法很有效,但是在单样本设置下,它很大程度上依赖于良好的软提示初始化。另一方面,它很容易导致过拟合。现有的工作利用预训练或监督的元学习初始化软提示,但它们无法高效地从未见过的后续任务中泛化数据。为了解决这些问题,本文提出了一种 novel 的自我监督元-Prompt learning框架和元梯度 Regularization,用于单样本 generalization (SUPMER)。我们首先设计了一系列不同的任务格式的自我监督基准元-训练任务,并使用课程增强任务扩展任务分布,进一步丰富了任务分布。然后,我们引入了一种新的元梯度 Regularization方法,并在元-Prompt learning中集成它,在单样本学习中将原始梯度转换为域通用的方向,从而减轻过拟合的问题。广泛的实验结果表明,SUPMER在不同单样本后续任务中取得了更好的表现,同时也表现出更强的域通用能力。
https://arxiv.org/abs/2303.12314
To address the challenges of long-tailed classification, researchers have proposed several approaches to reduce model bias, most of which assume that classes with few samples are weak classes. However, recent studies have shown that tail classes are not always hard to learn, and model bias has been observed on sample-balanced datasets, suggesting the existence of other factors that affect model bias. In this work, we systematically propose a series of geometric measurements for perceptual manifolds in deep neural networks, and then explore the effect of the geometric characteristics of perceptual manifolds on classification difficulty and how learning shapes the geometric characteristics of perceptual manifolds. An unanticipated finding is that the correlation between the class accuracy and the separation degree of perceptual manifolds gradually decreases during training, while the negative correlation with the curvature gradually increases, implying that curvature imbalance leads to model bias. Therefore, we propose curvature regularization to facilitate the model to learn curvature-balanced and flatter perceptual manifolds. Evaluations on multiple long-tailed and non-long-tailed datasets show the excellent performance and exciting generality of our approach, especially in achieving significant performance improvements based on current state-of-the-art techniques. Our work opens up a geometric analysis perspective on model bias and reminds researchers to pay attention to model bias on non-long-tailed and even sample-balanced datasets. The code and model will be made public.
为了解决长尾巴分类面临的挑战,研究人员提出了几种方法来减少模型偏差,其中大多数方法都假设只有样本较少的类别是弱类别。然而,最近的研究表明,尾部类别并不一定很难学习,而且在样本平衡的数据集上观察到模型偏差,这表明存在影响模型偏差的其他因素。在本文中,我们系统地提出了在深度神经网络中的感知子空间几何测量,然后探索感知子空间几何特征对分类困难的影响了,以及学习如何塑造感知子空间几何特征。一个出乎意料的发现是,在训练期间,类准确性与感知子空间的分离程度之间的相关度逐渐下降,而与曲率的负相关度逐渐增加,意味着曲率不平衡会导致模型偏差。因此,我们提出了曲率正则化,以促进模型学习曲率平衡和更平坦的感知子空间。多个长尾巴和非长尾巴数据集的评估表明,我们的方法表现出卓越的性能和令人兴奋的灵活性,特别是在基于当前先进技术的方法中实现显著的性能改进。我们的工作打开了模型偏差的几何分析视角,并提醒研究人员注意非长尾巴数据和甚至样本平衡数据集上的模型偏差。代码和模型将公开发布。
https://arxiv.org/abs/2303.12307
Binary concepts are empirically used by humans to generalize efficiently. And they are based on Bernoulli distribution which is the building block of information. These concepts span both low-level and high-level features such as "large vs small" and "a neuron is active or inactive". Binary concepts are ubiquitous features and can be used to transfer knowledge to improve model generalization. We propose a novel binarized regularization to facilitate learning of binary concepts to improve the quality of data generation in autoencoders. We introduce a binarizing hyperparameter $r$ in data generation process to disentangle the latent space symmetrically. We demonstrate that this method can be applied easily to existing variational autoencoder (VAE) variants to encourage symmetric disentanglement, improve reconstruction quality, and prevent posterior collapse without computation overhead. We also demonstrate that this method can boost existing models to learn more transferable representations and generate more representative samples for the input distribution which can alleviate catastrophic forgetting using generative replay under continual learning settings.
二进制概念是人类以经验方式高效地泛化的基础。它们基于伯努利分布,是信息的基本构成单元。这些概念涵盖了低级和高级特征,如“大与小”和“神经元是否活跃或静止”。二进制概念是普遍存在的特征,可以用于传递知识以提高模型泛化能力。我们提出了一种新的二进制归一化方法,以促进二进制概念的学习,改善自动编码器中数据生成质量。我们在数据生成过程中引入了一个二进制归一化超参数$r$,以对称地分离隐层空间。我们证明,这种方法可以轻松地应用于现有的变体自编码器(VAE)中,以鼓励对称分离,提高重建质量,并防止后遗聚合。我们还证明,这种方法可以提高现有模型的学习可移植表示能力,生成更多的代表性样本,从而减轻使用生成回放在持续学习设置下引起的灾难性遗忘。
https://arxiv.org/abs/2303.12255
We present a new method of self-supervised learning and knowledge distillation based on the multi-views and multi-representations (MV-MR). The MV-MR is based on the maximization of dependence between learnable embeddings from augmented and non-augmented views, jointly with the maximization of dependence between learnable embeddings from augmented view and multiple non-learnable representations from non-augmented view. We show that the proposed method can be used for efficient self-supervised classification and model-agnostic knowledge distillation. Unlike other self-supervised techniques, our approach does not use any contrastive learning, clustering, or stop gradients. MV-MR is a generic framework allowing the incorporation of constraints on the learnable embeddings via the usage of image multi-representations as regularizers. Along this line, knowledge distillation is considered a particular case of such a regularization. MV-MR provides the state-of-the-art performance on the STL10 and ImageNet-1K datasets among non-contrastive and clustering-free methods. We show that a lower complexity ResNet50 model pretrained using proposed knowledge distillation based on the CLIP ViT model achieves state-of-the-art performance on STL10 linear evaluation. The code is available at: this https URL
我们提出了基于多视角和多Representations(MV-MR)的自监督学习和知识蒸馏新方法。MV-MR基于增加和未增加视图的学习可解释嵌入之间的依赖最大化,同时增加视图和学习可解释嵌入之间的依赖最大化,同时减少未增加视图中多个非学习可解释表示之间的依赖。我们证明,该提议方法可以用于高效的自监督分类和模型无关的知识蒸馏。与其他自监督技术不同,我们的方法不使用任何对比学习、聚类或停止梯度。MV-MR是一个通用框架,允许使用图像多表示作为正则化器,通过使用图像多表示来添加约束。沿着这个方向,知识蒸馏被视为这种正则化的特殊例子。MV-MR在STL10和ImageNet-1K等非对比和无聚类方法中的最先进的性能提供了示范。代码可用在这个httpsURL上:。
https://arxiv.org/abs/2303.12130
Large scale text-guided diffusion models have garnered significant attention due to their ability to synthesize diverse images that convey complex visual concepts. This generative power has more recently been leveraged to perform text-to-3D synthesis. In this work, we present a technique that harnesses the power of latent diffusion models for editing existing 3D objects. Our method takes oriented 2D images of a 3D object as input and learns a grid-based volumetric representation of it. To guide the volumetric representation to conform to a target text prompt, we follow unconditional text-to-3D methods and optimize a Score Distillation Sampling (SDS) loss. However, we observe that combining this diffusion-guided loss with an image-based regularization loss that encourages the representation not to deviate too strongly from the input object is challenging, as it requires achieving two conflicting goals while viewing only structure-and-appearance coupled 2D projections. Thus, we introduce a novel volumetric regularization loss that operates directly in 3D space, utilizing the explicit nature of our 3D representation to enforce correlation between the global structure of the original and edited object. Furthermore, we present a technique that optimizes cross-attention volumetric grids to refine the spatial extent of the edits. Extensive experiments and comparisons demonstrate the effectiveness of our approach in creating a myriad of edits which cannot be achieved by prior works.
大型文本引导扩散模型吸引了广泛关注,因为它们能够合成传递复杂视觉概念的 diverse 图像。这种生成能力最近被利用来实现文本到3D的合成。在这项工作中,我们提出了一种技术,利用隐扩散模型的力量编辑现有的3D对象。我们使用Oriented 2D图像作为输入,并学习基于网格的体积表示。为了指导体积表示符合目标文本 prompt,我们遵循无条件文本到3D方法,并优化SDS损失。然而,我们发现,将这种扩散引导损失与基于图像的恢复损失相结合,并鼓励表示不偏离输入对象太过强烈,是一项挑战性的任务,因为这需要同时实现两个相互冲突的目标,而只看到2D投影的结构与外观结合。因此,我们引入了一种新的体积恢复损失,它在3D空间直接操作,利用我们的3D表示的明确性质来促进原始对象和编辑对象全局结构的相关性。此外,我们提出了一种技术,优化交叉注意力体积网格,以 refine 操作的Spatial extent。广泛的实验和比较证明了我们的方法在创建无法通过先前工作实现的一系列编辑。
https://arxiv.org/abs/2303.12048
The rising performance of deep neural networks is often empirically attributed to an increase in the available computational power, which allows complex models to be trained upon large amounts of annotated data. However, increased model complexity leads to costly deployment of modern neural networks, while gathering such amounts of data requires huge costs to avoid label noise. In this work, we study the ability of compression methods to tackle both of these problems at once. We hypothesize that quantization-aware training, by restricting the expressivity of neural networks, behaves as a regularization. Thus, it may help fighting overfitting on noisy data while also allowing for the compression of the model at inference. We first validate this claim on a controlled test with manually introduced label noise. Furthermore, we also test the proposed method on Facial Action Unit detection, where labels are typically noisy due to the subtlety of the task. In all cases, our results suggests that quantization significantly improve the results compared with existing baselines, regularization as well as other compression methods.
深度学习模型的性能不断提升往往Empirically归咎于可用的计算资源增加,这使得复杂的模型能够基于大量标注数据进行训练。然而,模型复杂性的增加会导致现代神经网络的昂贵部署,而收集这样数量的数据需要巨大的成本以避免标签噪声。在本文中,我们将研究压缩方法如何解决这两个问题同时存在。我们假设有意识训练可以通过限制神经网络的表达力来被视为正则化。因此,它可能有助于在噪声数据上避免过拟合,同时也允许模型在推理时进行压缩。我们首先在手动引入标签噪声的控制测试中验证这一假设。此外,我们还测试了提出的方法,用于面部行动单元检测,该任务通常由于任务的微妙性而存在标签噪声。在所有情况下,我们的结果表明,量化 significantly improve 结果 compared with existing baselines, regularization as well as other compression methods.
https://arxiv.org/abs/2303.11803
Deep neural networks are widely recognized as being vulnerable to adversarial perturbation. To overcome this challenge, developing a robust classifier is crucial. So far, two well-known defenses have been adopted to improve the learning of robust classifiers, namely adversarial training (AT) and Jacobian regularization. However, each approach behaves differently against adversarial perturbations. First, our work carefully analyzes and characterizes these two schools of approaches, both theoretically and empirically, to demonstrate how each approach impacts the robust learning of a classifier. Next, we propose our novel Optimal Transport with Jacobian regularization method, dubbed OTJR, jointly incorporating the input-output Jacobian regularization into the AT by leveraging the optimal transport theory. In particular, we employ the Sliced Wasserstein (SW) distance that can efficiently push the adversarial samples' representations closer to those of clean samples, regardless of the number of classes within the dataset. The SW distance provides the adversarial samples' movement directions, which are much more informative and powerful for the Jacobian regularization. Our extensive experiments demonstrate the effectiveness of our proposed method, which jointly incorporates Jacobian regularization into AT. Furthermore, we demonstrate that our proposed method consistently enhances the model's robustness with CIFAR-100 dataset under various adversarial attack settings, achieving up to 28.49% under AutoAttack.
深度学习网络被广泛认为是易受攻击的。要克服这一挑战,开发一个可靠的分类器是至关重要的。目前,已经采用了两个已知的防御方法来改进可靠的分类器的学习,即对抗训练(AT)和贾酸碱 regularization。然而,每种方法对攻击响应的方式都不同。我们首先仔细分析和总结了这两个方法的理论和实证特征,以证明每种方法如何影响可靠的分类器学习。接下来,我们提出了我们的新型最优传输与贾酸碱 regularization方法,称为OTJR,同时将输入输出贾酸碱 regularization融入AT中,利用最优传输理论进行借鉴。特别地,我们采用了sliced Wasserstein距离,它可以高效地将攻击样本的表示接近于干净样本的表示,无论数据集中的类别数量。SW距离提供了攻击样本的运动方向,这对贾酸碱 regularization非常 informative和强大。我们的广泛实验证明了我们提出的新方法的有效性,该方法将贾酸碱 regularization融入AT中。此外,我们证明了我们提出的方法 consistently enhance the model's robustness with the CIFAR-100 dataset under various adversarial attack settings, achieving up to 28.49% under AutoAttack.
https://arxiv.org/abs/2303.11793
Numerous research efforts have been made to stabilize the training of the Generative Adversarial Networks (GANs), such as through regularization and architecture design. However, we identify the instability can also arise from the fragile balance at the early stage of adversarial learning. This paper proposes the CoopInit, a simple yet effective cooperative learning-based initialization strategy that can quickly learn a good starting point for GANs, with a very small computation overhead during training. The proposed algorithm consists of two learning stages: (i) Cooperative initialization stage: The discriminator of GAN is treated as an energy-based model (EBM) and is optimized via maximum likelihood estimation (MLE), with the help of the GAN's generator to provide synthetic data to approximate the learning gradients. The EBM also guides the MLE learning of the generator via MCMC teaching; (ii) Adversarial finalization stage: After a few iterations of initialization, the algorithm seamlessly transits to the regular mini-max adversarial training until convergence. The motivation is that the MLE-based initialization stage drives the model towards mode coverage, which is helpful in alleviating the issue of mode dropping during the adversarial learning stage. We demonstrate the effectiveness of the proposed approach on image generation and one-sided unpaired image-to-image translation tasks through extensive experiments.
已经做了很多研究,旨在稳定生成对抗网络(GAN)的训练,比如通过Regularization和架构设计等方法。然而,我们发现GAN的训练不稳定可能是由于在对抗学习的早期阶段脆弱的平衡引起的。本文提出了CoopInit,一种简单但有效的合作初始化策略,可以在训练期间非常小的计算 overhead 的情况下快速学习GAN的一个好的起点,同时减少了训练期间的能耗。该算法由两个学习阶段组成:(i) 合作初始化阶段:GAN的判别器被视为基于能量模型(EBM)并通过最大似然估计(MLE)优化,利用GAN的生成器提供合成数据以近似学习梯度。EBM还通过MCMC teaching引导生成器进行MLE学习;(ii) 对抗最终化阶段:在初始化几步之后,算法无缝地过渡到 regular mini-max 对抗训练,直到收敛。动力是MLE初始化阶段推动模型进入模式覆盖,这有助于减轻在对抗学习阶段模式丢失的问题。我们通过广泛的实验证明了该方法在图像生成和两侧独立的图像到图像翻译任务中的 effectiveness。
https://arxiv.org/abs/2303.11649
Semi-supervised semantic segmentation learns a model for classifying pixels into specific classes using a few labeled samples and numerous unlabeled images. The recent leading approach is consistency regularization by selftraining with pseudo-labeling pixels having high confidences for unlabeled images. However, using only highconfidence pixels for self-training may result in losing much of the information in the unlabeled datasets due to poor confidence calibration of modern deep learning networks. In this paper, we propose a class-adaptive semisupervision framework for semi-supervised semantic segmentation (CAFS) to cope with the loss of most information that occurs in existing high-confidence-based pseudolabeling methods. Unlike existing semi-supervised semantic segmentation frameworks, CAFS constructs a validation set on a labeled dataset, to leverage the calibration performance for each class. On this basis, we propose a calibration aware class-wise adaptive thresholding and classwise adaptive oversampling using the analysis results from the validation set. Our proposed CAFS achieves state-ofthe-art performance on the full data partition of the base PASCAL VOC 2012 dataset and on the 1/4 data partition of the Cityscapes dataset with significant margins of 83.0% and 80.4%, respectively. The code is available at this https URL.
半监督语义分割通过少量的标记样本和大量的未标记图像来学习将像素分类到特定类别的模型。最近的领导方法是通过自我训练来保持一致性,同时使用具有未标记图像中高可信度伪标记像素的方法。然而,仅使用高可信度像素进行自我训练可能会在未标记数据集上丢失大部分信息,因为现代深度学习网络的信誉校准较差。在本文中,我们提出了一种按类自适应半监督语义分割框架(CAFS),以应对现有的高可信度伪标记方法中发生的大部分信息丢失。与现有的半监督语义分割框架不同,CAFS在一个标记数据集上构建了一个验证集,以利用每个类别的校准性能。基于验证集的分析结果,我们提出了一种校准意识的分类wise自适应阈值法和分类wise自适应过度采样方法。我们提出的CAFS在PASCAL VOC 2012基础数据集的完整数据分区和城市景观数据集的1/4数据分区中取得了最先进的性能,分别占总数据的83.0%和80.4%。代码在此httpsURL上可用。
https://arxiv.org/abs/2303.11606
Deep Neural Networks (DNNs)-based semantic segmentation models trained on a source domain often struggle to generalize to unseen target domains, i.e., a domain gap problem. Texture often contributes to the domain gap, making DNNs vulnerable to domain shift because they are prone to be texture-biased. Existing Domain Generalized Semantic Segmentation (DGSS) methods have alleviated the domain gap problem by guiding models to prioritize shape over texture. On the other hand, shape and texture are two prominent and complementary cues in semantic segmentation. This paper argues that leveraging texture is crucial for improving performance in DGSS. Specifically, we propose a novel framework, coined Texture Learning Domain Randomization (TLDR). TLDR includes two novel losses to effectively enhance texture learning in DGSS: (1) a texture regularization loss to prevent overfitting to source domain textures by using texture features from an ImageNet pre-trained model and (2) a texture generalization loss that utilizes random style images to learn diverse texture representations in a self-supervised manner. Extensive experimental results demonstrate the superiority of the proposed TLDR; e.g., TLDR achieves 46.5 mIoU on GTA-to-Cityscapes using ResNet-50, which improves the prior state-of-the-art method by 1.9 mIoU.
基于深度神经网络(DNN)的语义分割模型在训练源domain时往往难以泛化到未知的目标domain,即存在域差问题。纹理常常导致域差,使DNN容易受到域移的影响,因为它们通常倾向于受到纹理偏见。现有的源普遍语义分割(DGSS)方法已经通过指导模型将形状优先级高于纹理来缓解域差问题,通过引导模型将形状优先级高于纹理,可以消除域差问题。另一方面,形状和纹理是语义分割中的两个主要且互补的线索。本文认为利用纹理是改善DGSS性能的关键。具体而言,我们提出了一种新的框架,称为纹理学习域随机化(TLDR),它包括两个新的损失,以有效地增强DGSS中的纹理学习:(1)纹理正则化损失,以避免对源domain的纹理进行过度拟合,使用图像Net上预训练的纹理特征,(2)纹理泛化损失,利用随机风格图像以自监督方式学习多种纹理表示。广泛的实验结果证明了所提出的TLDR的优越性,例如,TLDR使用ResNet-50在GTA到城市景观的语义分割任务中取得了46.5 mIoU,比先前的方法提高了1.9 mIoU。
https://arxiv.org/abs/2303.11546
Geological processes determine the distribution of resources such as critical minerals, water, and geothermal energy. However, direct observation of geology is often prevented by surface cover such as overburden or vegetation. In such cases, remote and in-situ surveys are frequently conducted to collect physical measurements of the earth indicative of the geology. Developing a geological segmentation based on these measurements is challenging since individual datasets can differ in properties (e.g. units, dynamic ranges, textures) and because the data does not uniquely constrain the geology. Further, as the number of datasets grows the information to constrain geology increases while simultaneously becoming harder to make sense of. Inspired by the concept of superpixels, we propose a deep-learning based approach to segment rasterized survey data into regions with similar characteristics. We demonstrate its use for semi-automated geoscientific mapping with datasets arising from independent sensors and with diverse properties. In addition, we introduce a new loss function for superpixels including a novel regularization parameter penalizing image segmentation with non-connected component superpixels. This improves integration of prior knowledge by allowing better control over the number of superpixels generated.
地质过程决定关键矿物、水资源和地热能资源的分布。然而,直接观察地质过程通常被表面覆盖物如覆盖层或植被所阻止。在这种情况下,远程和现场调查经常被进行以收集地球物理测量指示地质过程的数据。开发基于这些测量的地质分割是一项挑战性的任务,因为单个数据集可能具有不同的属性(例如单位、动态范围、纹理),并且数据并不独特地限制地质过程。此外,随着数据集的增加,限制地质过程的信息和增加使其更难解释的信息同时增加。受到像素概念的启示,我们提出了一种深度学习方法,将扫描图像数据分割成具有类似特征的区域。我们演示了如何使用来自不同传感器的独立数据集和具有不同属性的数据集进行半自动地质科学Mapping。此外,我们引入了新的像素损失函数,包括一个新的正则化参数,以惩罚图像分割中无连接组件的像素分割。这改进了先前知识的融合,通过允许更好地控制生成像素的数量。
https://arxiv.org/abs/2303.11404
Neural radiance fields (NeRFs) have demonstrated state-of-the-art performance for 3D computer vision tasks, including novel view synthesis and 3D shape reconstruction. However, these methods fail in adverse weather conditions. To address this challenge, we introduce DehazeNeRF as a framework that robustly operates in hazy conditions. DehazeNeRF extends the volume rendering equation by adding physically realistic terms that model atmospheric scattering. By parameterizing these terms using suitable networks that match the physical properties, we introduce effective inductive biases, which, together with the proposed regularizations, allow DehazeNeRF to demonstrate successful multi-view haze removal, novel view synthesis, and 3D shape reconstruction where existing approaches fail.
神经光散射场(NeRFs)在3D计算机视觉任务中已经展示了最先进的性能,包括新视野合成和3D形状重建。然而,这些方法在恶劣的天气条件下表现不佳。为了解决这个挑战,我们提出了DehazeNeRF框架,可以在模糊条件下稳健运行。DehazeNeRF扩展了体积渲染方程,添加了描述大气散射的物理真实项。通过使用适合的物理性质匹配的网络参数化这些项,我们引入了有效的晕移偏见,与提出的正则化一起,使DehazeNeRF能够在 existing 方法失败的情况下成功去除多视角模糊、新视野合成和3D形状重建。
https://arxiv.org/abs/2303.11364