Label scarcity is a bottleneck for improving task performance in specialised domains. We propose a novel compositional transfer learning framework (DoT5 - domain compositional zero-shot T5) for zero-shot domain transfer. Without access to in-domain labels, DoT5 jointly learns domain knowledge (from MLM of unlabelled in-domain free text) and task knowledge (from task training on more readily available general-domain data) in a multi-task manner. To improve the transferability of task training, we design a strategy named NLGU: we simultaneously train NLG for in-domain label-to-data generation which enables data augmentation for self-finetuning and NLU for label prediction. We evaluate DoT5 on the biomedical domain and the resource-lean subdomain of radiology, focusing on NLI, text summarisation and embedding learning. DoT5 demonstrates the effectiveness of compositional transfer learning through multi-task learning. In particular, DoT5 outperforms the current SOTA in zero-shot transfer by over 7 absolute points in accuracy on RadNLI. We validate DoT5 with ablations and a case study demonstrating its ability to solve challenging NLI examples requiring in-domain expertise.
标签稀缺是改善特定领域的任务表现的瓶颈。我们提出了一种全新的组件化 Transfer Learning 框架(DoT5 - 域组件式零次输入 T5),用于零次输入域转移。在没有访问域内标签的情况下,DoT5 以多任务方式共同学习域知识和任务知识(从未标记的域内自由文本的 LM 中提取任务知识,并从任务训练更常见的通用数据集中提取数据增强和 NLU)。为了提高任务训练的可转移性,我们设计了一种名为 NLGU 的策略:我们同时训练 In-domain 标签到数据生成 NLG,这可以实现数据增强的自训练和标签预测的 NLU。我们在生物医学领域和放射学资源受限的子领域评估了 DoT5,重点关注 NLI、文本摘要和嵌入学习。DoT5 通过多任务学习证明了组件化转移学习的 effectiveness。特别是,DoT5 在 RadNLI 上的零次输入转移中比当前的最佳方法高出超过 7 的绝对点的准确性。我们通过实验和案例研究验证了 DoT5 的能力,以解决需要域内专业知识的具有挑战性的 NLI 示例。
https://arxiv.org/abs/2303.13386
Within academia and industry, there has been a need for expansive simulation frameworks that include model-based simulation of sensors, mobile vehicles, and the environment around them. To this end, the modular, real-time, and open-source AirSim framework has been a popular community-built system that fulfills some of those needs. However, the framework required adding systems to serve some complex industrial applications, including designing and testing new sensor modalities, Simultaneous Localization And Mapping (SLAM), autonomous navigation algorithms, and transfer learning with machine learning models. In this work, we discuss the modification and additions to our open-source version of the AirSim simulation framework, including new sensor modalities, vehicle types, and methods to generate realistic environments with changeable objects procedurally. Furthermore, we show the various applications and use cases the framework can serve.
在学术界和工业界,需要有扩展性的模拟框架,其中包括基于模型的传感器、移动车辆及其周围环境的模拟。为此,模块化、实时且开源的AirSim框架已成为一个受欢迎的社区构建系统,满足了其中一些需求。然而,框架需要添加系统以服务一些复杂的工业应用,包括设计和测试新的传感器模式、同时定位和地图(SLAM)、自主导航算法以及与机器学习模型的转移学习。在这项工作中,我们讨论了我们开源版本的AirSim模拟框架的修改和添加,包括新的传感器模式、车辆类型和方法,以生成具有可变化对象的实际环境。此外,我们展示了框架可以服务的多种应用和 use cases。
https://arxiv.org/abs/2303.13381
Parameter-Efficient transfer learning with Adapters have been studied in Natural Language Processing (NLP) as an alternative to full fine-tuning. Adapters are memory-efficient and scale well with downstream tasks by training small bottle-neck layers added between transformer layers while keeping the large pretrained language model (PLMs) frozen. In spite of showing promising results in NLP, these methods are under-explored in Information Retrieval. While previous studies have only experimented with dense retriever or in a cross lingual retrieval scenario, in this paper we aim to complete the picture on the use of adapters in IR. First, we study adapters for SPLADE, a sparse retriever, for which adapters not only retain the efficiency and effectiveness otherwise achieved by finetuning, but are memory-efficient and orders of magnitude lighter to train. We observe that Adapters-SPLADE not only optimizes just 2\% of training parameters, but outperforms fully fine-tuned counterpart and existing parameter-efficient dense IR models on IR benchmark datasets. Secondly, we address domain adaptation of neural retrieval thanks to adapters on cross-domain BEIR datasets and TripClick. Finally, we also consider knowledge sharing between rerankers and first stage rankers. Overall, our study complete the examination of adapters for neural IR
在自然语言处理(NLP)中,使用Adapters作为参数高效的转移学习替代方法已经得到了研究。Adapters能够在Transformer层之间添加小型瓶颈层,同时保持大型预训练语言模型(PLM)冻结,从而实现 Memory-Efficient Transfer Learning。尽管在NLP中取得了 promising 的结果,但在信息检索中这些方法仍然未被深入研究。尽管以前的研究仅尝试过密集检索或跨语言检索场景,但本 paper 旨在完整描述在IR中使用Adapters的情况。首先,我们研究了Adapters-SPLADE,它是一个稀疏检索器,Adapters不仅保留了经过微调后实现的效率与效果,而且具有 Memory-Efficient 和数量级更轻的训练能力。我们观察到Adapters-SPLADE不仅优化了训练参数的2\% ,而且在IR基准数据集上比完全微调的替代品和现有的参数高效的密集IR模型表现更好。其次,我们考虑了跨域BEIR数据和 TripClick Adapters 的神经网络检索域适应问题。最后,我们还考虑了重新排名器和第一级排名器之间的知识共享。总之,我们的研究涵盖了神经网络IRAdapters的使用。
https://arxiv.org/abs/2303.13220
We address the challenge of training a large supernet for the object detection task, using a relatively small amount of training data. Specifically, we propose an efficient supernet-based neural architecture search (NAS) method that uses transfer learning and search space pruning. First, the supernet is pre-trained on a classification task, for which large datasets are available. Second, the search space defined by the supernet is pruned by removing candidate models that are predicted to perform poorly. To effectively remove the candidates over a wide range of resource constraints, we particularly design a performance predictor, called path filter, which can accurately predict the relative performance of the models that satisfy similar resource constraints. Hence, supernet training is more focused on the best-performing candidates. Our path filter handles prediction for paths with different resource budgets. Compared to once-for-all, our proposed method reduces the computational cost of the optimal network architecture by 30% and 63%, while yielding better accuracy-floating point operations Pareto front (0.85 and 0.45 points of improvement on average precision for Pascal VOC and COCO, respectively).
我们解决了训练大型超网络用于目标检测任务的挑战,使用了相对较小的训练数据。具体而言,我们提出了一种高效的超网络神经网络架构搜索方法(NAS),该方法使用迁移学习和搜索空间剪枝。首先,超网络在分类任务上进行了预训练,有大量数据可用。其次,超网络定义的搜索空间通过删除预测表现较差的候选模型进行修剪。为了有效地去除在各种资源限制下的候选模型,我们特别设计了性能预测器,称为路径滤波,它能够准确地预测满足类似资源限制的模型的性能相对表现。因此,超网络训练更关注表现最好的候选模型。我们的路径滤波处理不同资源预算下的预测路径。与一次性搜索相比,我们提出的方法降低了最优网络架构的计算成本,下降了30%和63%,同时提供了更好的浮点操作精度 Pareto 前端(分别提高0.85点和0.45点)。
https://arxiv.org/abs/2303.13121
Many material properties are manifested in the morphological appearance and characterized with microscopic image, such as scanning electron microscopy (SEM). Polymer compatibility is a key physical quantity of polymer material and commonly and intuitively judged by SEM images. However, human observation and judgement for the images is time-consuming, labor-intensive and hard to be quantified. Computer image recognition with machine learning method can make up the defects of artificial judging, giving accurate and quantitative judgement. We achieve automatic compatibility recognition utilizing convolution neural network and transfer learning method, and the model obtains up to 94% accuracy. We also put forward a quantitative criterion for polymer compatibility with this model. The proposed method can be widely applied to the quantitative characterization of the microstructure and properties of various materials.
许多材料特性体现在其形态结构和用小角度图像进行特征识别等方面,例如扫描电子显微镜(SEM)。聚合物相容性是聚合物材料的关键物理量,通常通过SEM图像直觉地判断。然而,对图像进行人类观察和判断是耗时、劳动密集型且难以量化的。利用机器学习方法计算机图像识别可以弥补人工判断的缺陷,提供准确和定量的判断。我们利用卷积神经网络和迁移学习方法实现自动相容识别,模型准确率达到94%。我们还提出了与该模型的聚合物相容性的定量标准。该方法可以广泛应用于各种材料微观结构和性质的定量表征。
https://arxiv.org/abs/2303.12360
We generated 25000 conversations labeled with Big Five Personality traits using prompt programming at GPT-3. Then we train Big Five classification models with these data and evaluate them with 2500 data from generated dialogues and real conversational datasets labeled in Big Five by human annotators. The results indicated that this approach is promising for creating effective training data. We then compare the performance by different training approaches and models. Our results suggest that using Adapter-Transformers and transfer learning from pre-trained RoBERTa sentiment analysis model will perform best with the generated data. Our best model obtained an accuracy of 0.71 in generated data and 0.65 in real datasets. Finally, we discuss this approach's potential limitations and confidence metric.
我们通过在GPT-3中prompt编程生成了25000次带有Big Five人格特征的对话,并将这些数据用于训练Big Five分类模型,同时使用人类标注的生成的对话和真实对话数据集来评估模型的性能。结果表明,这种方法对于生成有效的训练数据具有前景。然后,我们比较了不同的训练方法和模型的性能。我们的结果表明,使用Adapter-Transformers和从预训练的RoBERTa情感分析模型中学习的迁移学习将表现最佳。我们的最佳模型在生成的数据上获得了0.71的准确性,而在真实数据上获得了0.65的准确性。最后,我们讨论了这种方法的潜在限制和置信度度量。
https://arxiv.org/abs/2303.12279
Masked Autoencoders (MAEs) learn self-supervised representations by randomly masking input image patches and a reconstruction loss. Alternatively, contrastive learning self-supervised methods encourage two versions of the same input to have a similar representation, while pulling apart the representations for different inputs. We propose ViC-MAE, a general method that combines both MAE and contrastive learning by pooling the local feature representations learned under the MAE reconstruction objective and leveraging this global representation under a contrastive objective across video frames. We show that visual representations learned under ViC-MAE generalize well to both video classification and image classification tasks. Using a backbone ViT-B/16 network pre-trained on the Moments in Time (MiT) dataset, we obtain state-of-the-art transfer learning from video to images on Imagenet-1k by improving 1.58% in absolute top-1 accuracy from a recent previous work. Moreover, our method maintains a competitive transfer-learning performance of 81.50% top-1 accuracy on the Kinetics-400 video classification benchmark. In addition, we show that despite its simplicity, ViC-MAE yields improved results compared to combining MAE pre-training with previously proposed contrastive objectives such as VicReg and SiamSiam.
遮蔽自动编码器(MAEs)通过随机遮蔽输入图像点和重构损失学习自监督表示。Alternatively,比较性学习自监督方法鼓励相同的输入版本具有相似的表示,同时分离不同的输入版本的表示。我们提出了ViC-MAE,一种通用方法,将MAEs和比较性学习相结合,通过汇总在MAEs重构目标下学习的小特征表示并利用跨视频帧的比较目标上的优势,获得视频分类和图像分类任务的最新研究成果。通过在时间序列数据(MiT)数据集上预先训练的ViT-B/16网络,我们在Imagenet-1k上从视频到图像的迁移学习中获得最先进的结果,从最近的一项工作提高了1.58%的绝对准确率。此外,我们的方法在Kinetics-400视频分类基准上保持了具有竞争力的迁移学习性能,保持了81.50%的top-1准确率。此外,我们表明,尽管ViC-MAE的简单易用,但它比结合MAEs预训练与以前提出的比较性目标,如vicReg和SiamSiam等方法获得更好的结果。
https://arxiv.org/abs/2303.12001
Contrastive vision-language models (e.g. CLIP) are typically created by updating all the parameters of a vision model and language model through contrastive training. Can such models be created by a small number of parameter updates to an already-trained language model and vision model? The literature describes techniques that can create vision-language models by updating a small number of parameters in a language model, but these require already aligned visual representations and are non-contrastive, hence unusable for latency-sensitive applications such as neural search. We explore the feasibility and benefits of parameter-efficient contrastive vision-language alignment through transfer learning: creating a model such as CLIP by minimally updating an already-trained vision and language model. We find that a minimal set of parameter updates ($<$7%) can achieve the same performance as full-model training, and updating specific components ($<$1% of parameters) can match 75% of full-model training. We describe a series of experiments: we show that existing knowledge is conserved more strongly in parameter-efficient training and that parameter-efficient scaling scales with model and dataset size. Where paired-image text data is scarce but strong multilingual language models exist (e.g. low resource languages), parameter-efficient training is even preferable to full-model training. Given a fixed compute budget, parameter-efficient training allows training larger models on the same hardware, achieving equivalent performance in less time. Parameter-efficient training hence constitutes an energy-efficient and effective training strategy for contrastive vision-language models that may be preferable to the full-model training paradigm for common use cases. Code and weights at this https URL.
对比视觉语言模型(例如 CLIP)通常通过contrastive训练更新视觉模型和语言模型的所有参数来实现。是否可以通过少量参数更新来创建已经训练好的视觉模型和语言模型呢?文献描述了可以更新语言模型中的少量参数来创建视觉语言模型的技术,但这些需要已经对齐的视觉表示,并且是不可比较的,因此不适合像神经网络搜索这样的延迟敏感应用。我们通过 Transfer Learning 来探索高效参数对比的视觉-语言对齐的可行性和好处:通过最小化已经训练好的视觉和语言模型的参数更新来创建类似于 CLIP 的模型。我们发现,仅进行一次参数更新(不到7%)可以实现与全模型训练相同的性能,而更新特定的组件(不到参数的1%)可以匹配75%的全模型训练。我们描述了一系列实验:我们证明了在高效参数训练中,现有知识更容易被保存,而且高效参数训练的 scaling 与模型和数据集大小成正比。当配对图像文本数据有限但存在强大的多语言语言模型时(例如资源有限的语言),高效参数训练甚至优于全模型训练。给定固定的计算预算,高效参数训练可以在相同的硬件上训练更大的模型,在更短的时间内实现同等性能。因此,高效参数训练构成了对比视觉语言模型的高效、有效训练策略,对于常见的应用场景可能是更好的选择。代码和权重在这个 https URL 上。
https://arxiv.org/abs/2303.11866
Annotating new datasets for machine learning tasks is tedious, time-consuming, and costly. For segmentation applications, the burden is particularly high as manual delineations of relevant image content are often extremely expensive or can only be done by experts with domain-specific knowledge. Thanks to developments in transfer learning and training with weak supervision, segmentation models can now also greatly benefit from annotations of different kinds. However, for any new domain application looking to use weak supervision, the dataset builder still needs to define a strategy to distribute full segmentation and other weak annotations. Doing so is challenging, however, as it is a priori unknown how to distribute an annotation budget for a given new dataset. To this end, we propose a novel approach to determine annotation strategies for segmentation datasets, whereby estimating what proportion of segmentation and classification annotations should be collected given a fixed budget. To do so, our method sequentially determines proportions of segmentation and classification annotations to collect for budget-fractions by modeling the expected improvement of the final segmentation model. We show in our experiments that our approach yields annotations that perform very close to the optimal for a number of different annotation budgets and datasets.
为机器学习任务标注新的数据集是乏味、耗时且昂贵的。对于分割应用来说,负担尤为沉重,因为手动绘制相关图像内容通常非常昂贵或只能由具有特定领域知识的专家完成。由于迁移学习和弱监督训练的发展,分割模型现在也可以从不同类型的注释中受益匪浅。然而,对于任何新的领域应用,希望使用弱监督,dataset builder still needs to define a strategy to distribute full segmentation and other weak annotations。因此,我们提出了一种 novel 的方法来确定分割数据集的注释策略,其中通过估计给定预算的分割和分类注释应该占据的比例来收集。通过这种方法,我们的算法Sequentially determines the proportion of segmentation and classification annotations to collect for budget-fractions,通过建模最终分割模型的预期改进,来预测应该占据预算比例的分割和分类注释。我们在我们的实验中展示了,我们的算法产生注释的质量非常接近各种不同注释预算和数据集的最佳性能。
https://arxiv.org/abs/2303.11678
Transfer learning is a popular method for tuning pretrained (upstream) models for different downstream tasks using limited data and computational resources. We study how an adversary with control over an upstream model used in transfer learning can conduct property inference attacks on a victim's tuned downstream model. For example, to infer the presence of images of a specific individual in the downstream training set. We demonstrate attacks in which an adversary can manipulate the upstream model to conduct highly effective and specific property inference attacks (AUC score $> 0.9$), without incurring significant performance loss on the main task. The main idea of the manipulation is to make the upstream model generate activations (intermediate features) with different distributions for samples with and without a target property, thus enabling the adversary to distinguish easily between downstream models trained with and without training examples that have the target property. Our code is available at this https URL.
迁移学习是一种利用有限数据和计算资源,通过有限数据量和计算资源微调预训练(上游)模型,以不同下游任务的方法。我们研究如何利用控制上游模型的dversarial来对受害者微调的下游模型进行属性推断攻击。例如,推断特定个体在下游训练集中存在的图像。我们演示了攻击,dversarial可以通过操纵上游模型进行高度有效和特定的属性推断攻击(AUC得分大于0.9),而在主要任务中并未导致显著性能损失。操纵的主要思想是使上游模型产生具有与目标属性不同分布的激活(中间特征),从而使dversarial可以轻松地区分训练样本是否具有目标属性的下游模型。我们的代码可在这个httpsURL上获取。
https://arxiv.org/abs/2303.11643
In recent years there has been a growing demand from financial agents, especially from particular and institutional investors, for companies to report on climate-related financial risks. A vast amount of information, in text format, can be expected to be disclosed in the short term by firms in order to identify these types of risks in their financial and non financial reports, particularly in response to the growing regulation that is being passed on the matter. To this end, this paper applies state-of-the-art NLP techniques to achieve the detection of climate change in text corpora. We use transfer learning to fine-tune two transformer models, BERT and ClimateBert -a recently published DistillRoBERTa-based model that has been specifically tailored for climate text classification-. These two algorithms are based on the transformer architecture which enables learning the contextual relationships between words in a text. We carry out the fine-tuning process of both models on the novel Clima-Text database, consisting of data collected from Wikipedia, 10K Files Reports and web-based claims. Our text classification model obtained from the ClimateBert fine-tuning process on ClimaText, outperforms the models created with BERT and the current state-of-the-art transformer in this particular problem. Our study is the first one to implement on the ClimaText database the recently published ClimateBert algorithm. Based on our results, it can be said that ClimateBert fine-tuned on ClimaText is an outstanding tool within the NLP pre-trained transformer models that may and should be used by investors, institutional agents and companies themselves to monitor the disclosure of climate risk in financial reports. In addition, our transfer learning methodology is cheap in computational terms, thus allowing any organization to perform it.
近年来,金融代理人,特别是特定和机构投资者,对公司报告与气候相关的金融风险日益需求。大量信息以文本格式 expected to be披露短期以识别公司的财务和非财务报告中的这种类型的风险,特别是针对正在通过不断增加的监管。为此,本文应用最先进的自然语言处理技术来实现在文本数据集上的气候变化检测。我们使用迁移学习微调了两个Transformer模型,BERT和 ClimateBert - 最近发布的基于DistillRoBERTa的模型,专门设计为气候文本分类-。这两个算法基于Transformer架构,使学习文本中的单词之间的上下文关系成为可能。我们在名为Clima-Text的数据集上执行了这两个模型的微调过程,该数据集包括从维基百科、10K文件报告和网上声称收集的数据。从 ClimateBert 在ClimaText 微调过程中获得的诗歌分类模型在这个问题中的特定问题上表现更好。我们的研究是首个在 ClimaText 数据库上实施最近发布的 ClimateBert 算法的研究。根据我们的结果,可以说 ClimateBert 在 ClimaText 微调过程中是NLP预训练Transformer模型中的优秀工具,可能应该和必须由投资者、机构代理人和公司自己用于监测在财务报告中披露气候风险。此外,我们的迁移学习方法在计算上是廉价的,因此允许任何组织执行。
https://arxiv.org/abs/2303.13373
Recent studies show strong generative performance in domain translation especially by using transfer learning techniques on the unconditional generator. However, the control between different domain features using a single model is still challenging. Existing methods often require additional models, which is computationally demanding and leads to unsatisfactory visual quality. In addition, they have restricted control steps, which prevents a smooth transition. In this paper, we propose a new approach for high-quality domain translation with better controllability. The key idea is to preserve source features within a disentangled subspace of a target feature space. This allows our method to smoothly control the degree to which it preserves source features while generating images from an entirely new domain using only a single model. Our extensive experiments show that the proposed method can produce more consistent and realistic images than previous works and maintain precise controllability over different levels of transformation. The code is available at this https URL.
最近的研究表明,在跨域翻译中,使用无条件生成器并使用转移学习技术可以表现出强大的生成性能。然而,使用单个模型来控制不同域特征之间的控制仍然是一项挑战。现有的方法通常需要额外的模型,这会导致计算资源的浪费并产生不满意的视觉质量。此外,它们具有限制的控制步骤,这阻碍了平滑的过渡。在本文中,我们提出了一种新方法,以提供更控制性的高质量跨域翻译。其核心思想是在目标特征空间的分离子空间中保留源特征。这使我们的方法可以平滑地控制它保留源特征的程度,同时仅使用单个模型从一个全新的域中生成图像。我们的广泛实验表明,新方法能够产生比先前工作更为一致和真实的图像,并在不同的转换级别上保持精确的控制。代码在此httpsURL可用。
https://arxiv.org/abs/2303.11545
A major problem with using automated classification systems is that if they are not engineered correctly and with fairness considerations, they could be detrimental to certain populations. Furthermore, while engineers have developed cutting-edge technologies for image classification, there is still a gap in the application of these models in human heritage collections, where data sets usually consist of low-quality pictures of people with diverse ethnicity, gender, and age. In this work, we evaluate three bias mitigation techniques using two state-of-the-art neural networks, Xception and EfficientNet, for gender classification. Moreover, we explore the use of transfer learning using a fair data set to overcome the training data scarcity. We evaluated the effectiveness of the bias mitigation pipeline on a cultural heritage collection of photographs from the 19th and 20th centuries, and we used the FairFace data set for the transfer learning experiments. After the evaluation, we found that transfer learning is a good technique that allows better performance when working with a small data set. Moreover, the fairest classifier was found to be accomplished using transfer learning, threshold change, re-weighting and image augmentation as bias mitigation methods.
使用自动化分类系统的主要问题是,如果工程师没有正确地设计和考虑公平性,这些系统可能对某些人口产生不利影响。此外,虽然工程师已经开发了图像分类的前沿技术,但在人类文化遗产收藏中的应用仍然存在差距,数据集通常包含不同种族、性别和年龄的高质量照片。在这项工作中,我们使用了两个先进的神经网络Xception和EfficientNet,对性别分类进行评估,并探索使用公平数据集克服训练数据稀缺的方法。我们评估了 bias mitigation pipeline 对19和20世纪照片文化遗产收藏的 effectiveness,并使用公平Face数据集进行转移学习实验。评估后,我们发现转移学习是一种好的方法,可以在处理小型数据集时提供更好的表现。此外,我们发现最公平的分类器使用转移学习、阈值变化、重新加权和图像增强作为 bias mitigation方法。
https://arxiv.org/abs/2303.11449
Recent studies on transfer learning have shown that selectively fine-tuning a subset of layers or customizing different learning rates for each layer can greatly improve robustness to out-of-distribution (OOD) data and retain generalization capability in the pre-trained models. However, most of these methods employ manually crafted heuristics or expensive hyper-parameter searches, which prevent them from scaling up to large datasets and neural networks. To solve this problem, we propose Trainable Projected Gradient Method (TPGM) to automatically learn the constraint imposed for each layer for a fine-grained fine-tuning regularization. This is motivated by formulating fine-tuning as a bi-level constrained optimization problem. Specifically, TPGM maintains a set of projection radii, i.e., distance constraints between the fine-tuned model and the pre-trained model, for each layer, and enforces them through weight projections. To learn the constraints, we propose a bi-level optimization to automatically learn the best set of projection radii in an end-to-end manner. Theoretically, we show that the bi-level optimization formulation is the key to learning different constraints for each layer. Empirically, with little hyper-parameter search cost, TPGM outperforms existing fine-tuning methods in OOD performance while matching the best in-distribution (ID) performance. For example, when fine-tuned on DomainNet-Real and ImageNet, compared to vanilla fine-tuning, TPGM shows $22\%$ and $10\%$ relative OOD improvement respectively on their sketch counterparts. Code is available at \url{this https URL}.
最近的 Transfer Learning 研究已经表明,选择性地优化某些层或为每个层定制不同的学习速率可以极大地改善对非分布数据的可靠性和对训练模型的泛化能力。然而,这些方法大多数都采用了手工编写的启发式或昂贵的超参数搜索,这导致它们无法扩展到大型数据集和神经网络。为了解决这个问题,我们提出了可训练投影梯度方法(TPGM),以自动学习每个层强加的限制条件。这通过将 fine-tuning 解释为两个水平的约束优化问题而 motivated。具体来说,TPGM 维护每个层的投影半径集合,即 fine-tuning 模型和训练模型之间的距离限制,并通过权重投影来强制它们。为了学习限制条件,我们提出了两个水平的优化,以自动学习最佳投影半径集合。从理论上讲,我们表明,两个水平的优化 formulation 是学习每个层不同限制的关键。实证研究表明,在 OOD 性能方面,与基本 fine-tuning 相比,TPGM 在非分布性能方面表现更好,同时与分布最佳性能相媲美。例如,当在DomainNet-Real 和 ImageNet 上进行 fine-tuning 时,与基本 fine-tuning 相比,TPGM 在 Sketch 对应的 OOD 性能上表现出 $22\%$ 和 $10\%$ 的相对改善。代码可在 \url{this https URL} 上获取。
https://arxiv.org/abs/2303.10720
The possibility of high-precision and rapid detection of pathologies on chest X-rays makes it possible to detect the development of pneumonia at an early stage and begin immediate treatment. Artificial intelligence can speed up and qualitatively improve the procedure of X-ray analysis and give recommendations to the doctor for additional consideration of suspicious images. The purpose of this study is to determine the best models and implementations of the transfer learning method in the binary classification problem in the presence of a small amount of training data. In this article, various methods of augmentation of the initial data and approaches to training ResNet and DenseNet models for black-and-white X-ray images are considered, those approaches that contribute to obtaining the highest results of the accuracy of determining cases of pneumonia and norm at the testing stage are identified.
在高分辨率和快速检测 chest X-ray 中的常见病理的情况下,有可能早期检测出肺炎并立即开始治疗。人工智能可以加速和优化 X-ray 分析的过程,并建议医生对可疑图像进行额外的考虑。本文旨在确定在少量训练数据的情况下,在二进制分类问题中最佳的模型和实现,以及用于训练 ResNet 和 DenseNet 模型的增强方法。本文考虑了各种增加初始数据的方法和训练黑白 X-ray 图像的 ResNet 和 DenseNet 模型的方法。这些方法有助于在测试阶段获得最高的确定肺炎病例和标准的准确率。
https://arxiv.org/abs/2303.10601
Model adaptation aims at solving the domain transfer problem under the constraint of only accessing the pretrained source models. With the increasing considerations of data privacy and transmission efficiency, this paradigm has been gaining recent popularity. This paper studies the vulnerability to universal attacks transferred from the source domain during model adaptation algorithms due to the existence of the malicious providers. We explore both universal adversarial perturbations and backdoor attacks as loopholes on the source side and discover that they still survive in the target models after adaptation. To address this issue, we propose a model preprocessing framework, named AdaptGuard, to improve the security of model adaptation algorithms. AdaptGuard avoids direct use of the risky source parameters through knowledge distillation and utilizes the pseudo adversarial samples under adjusted radius to enhance the robustness. AdaptGuard is a plug-and-play module that requires neither robust pretrained models nor any changes for the following model adaptation algorithms. Extensive results on three commonly used datasets and two popular adaptation methods validate that AdaptGuard can effectively defend against universal attacks and maintain clean accuracy in the target domain simultaneously. We hope this research will shed light on the safety and robustness of transfer learning.
模型适应旨在解决只有访问预先训练的源模型才能解决的问题,以满足越来越多的数据隐私和传输效率考虑。随着对数据隐私和传输效率的日益关注,这种范式正在逐渐流行起来。本文研究由于存在恶意提供者,模型适应算法在源域中传输的通用攻击的脆弱性。我们探索了通用dversarial perturbations和后门攻击在源域中的漏洞,并发现在适应后,它们仍然在目标模型中存活。为了解决这一问题,我们提出了一个模型预处理框架,名为 adaptGuard,以改善模型适应算法的安全性。 adaptGuard避免直接使用有风险的源参数,通过知识蒸馏使用调整半径的伪dversarial样本增强鲁棒性。 adaptGuard是一个可插拔模块,不需要强大的预先训练模型或任何变化,适用于三个常用的数据集和两个流行的适应方法。广泛的结果显示, adaptGuard可以在目标域中有效地防御通用攻击,同时保持干净的准确性。我们希望这项研究将阐明迁移学习的安全和鲁棒性。
https://arxiv.org/abs/2303.10594
Despite recent competitive performance across a range of vision tasks, vision Transformers still have an issue of heavy computational costs. Recently, vision prompt learning has provided an economic solution to this problem without fine-tuning the whole large-scale models. However, the efficiency of existing models are still far from satisfactory due to insertion of extensive prompts blocks and trick prompt designs. In this paper, we propose an efficient vision model named impLicit vIsion prOmpt tuNing (LION), which is motivated by deep implicit models with stable memory costs for various complex tasks. In particular, we merely insect two equilibrium implicit layers in two ends of the pre-trained main backbone with parameters in the backbone frozen. Moreover, we prune the parameters in these two layers according to lottery hypothesis. The performance obtained by our LION are promising on a wide range of datasets. In particular, our LION reduces up to 11.5% of training parameter numbers while obtaining higher performance compared with the state-of-the-art baseline VPT, especially under challenging scenes. Furthermore, we find that our proposed LION had a good generalization performance, making it an easy way to boost transfer learning in the future.
尽管近年来在多种视觉任务上表现出了竞争力,视觉变换器仍然面临着计算成本高昂的问题。最近,视觉prompt learning已经提供了解决这个问题的经济性解决方案,而不需要对整个大型模型进行微调。然而,现有模型的效率仍然远远不足以令人满意,因为引入了大量prompt blocks和技巧prompt设计。在本文中,我们提出了一种高效的视觉模型,名为promptive vIsion prediction (LION),其动机是基于深度隐含模型,在不同复杂任务中具有稳定的内存成本。特别地,我们仅在训练主骨架的两端设置了两个平衡的隐含层,并将参数冻结在骨架中。此外,我们根据彩票假设修剪了这两个层的参数。我们LION在多种数据集上表现出 promising 的性能。特别是,我们的LION在比最先进的基线 VPT 更高的表现下,特别是在挑战性场景中的表现上取得了显著的降低。此外,我们发现我们提出的LION具有良好的泛化性能,使其成为未来提高迁移学习的简单方法。
https://arxiv.org/abs/2303.09992
Inspired by recent advances in diffusion models, which are reminiscent of denoising autoencoders, we investigate whether they can acquire discriminative representations for classification via generative pre-training. This paper shows that the networks in diffusion models, namely denoising diffusion autoencoders (DDAE), are unified self-supervised learners: by pre-training on unconditional image generation, DDAE has already learned strongly linear-separable representations at its intermediate layers without auxiliary encoders, thus making diffusion pre-training emerge as a general approach for self-supervised generative and discriminative learning. To verify this, we perform linear probe and fine-tuning evaluations on multi-class datasets. Our diffusion-based approach achieves 95.9% and 50.0% linear probe accuracies on CIFAR-10 and Tiny-ImageNet, respectively, and is comparable to masked autoencoders and contrastive learning for the first time. Additionally, transfer learning from ImageNet confirms DDAE's suitability for latent-space Vision Transformers, suggesting the potential for scaling DDAEs as unified foundation models.
受到扩散模型最近的进展的启发,这些进展类似于去噪自编码器,因此我们研究一下它们是否能够通过生成前训练来获得分类的个性化表示。本文表明,扩散模型中的网络,即去噪扩散自编码器(DDAE),是统一的自监督学习:通过无条件图像生成前训练,DDAE已经在其中间层上学习了强烈的线性分离表示,而不需要辅助编码器,这使得扩散前训练成为自监督生成和个性化学习的一般方法。为了验证这一点,我们对多个类别的数据集进行了线性探测和优化评估。我们在CIFAR-10和tiny-ImageNet上实现了95.9%和50.0%的线性探测精度,分别与掩码自编码器和对比学习相媲美。此外,从ImageNet的迁移学习确认了DDAE是否适合用于潜在空间的视觉变换器,这表明DDAE可以作为统一的基元模型的潜力。
https://arxiv.org/abs/2303.09769
Solving multiple visual tasks using individual models can be resource-intensive, while multi-task learning can conserve resources by sharing knowledge across different tasks. Despite the benefits of multi-task learning, such techniques can struggle with balancing the loss for each task, leading to potential performance degradation. We present a novel computation- and parameter-sharing framework that balances efficiency and accuracy to perform multiple visual tasks utilizing individually-trained single-task transformers. Our method is motivated by transfer learning schemes to reduce computational and parameter storage costs while maintaining the desired performance. Our approach involves splitting the tasks into a base task and the other sub-tasks, and sharing a significant portion of activations and parameters/weights between the base and sub-tasks to decrease inter-task redundancies and enhance knowledge sharing. The evaluation conducted on NYUD-v2 and PASCAL-context datasets shows that our method is superior to the state-of-the-art transformer-based multi-task learning techniques with higher accuracy and reduced computational resources. Moreover, our method is extended to video stream inputs, further reducing computational costs by efficiently sharing information across the temporal domain as well as the task domain. Our codes and models will be publicly available.
使用单个模型解决多个视觉任务可能需要大量的资源,而多任务学习可以通过在不同任务之间共享知识来节省资源。尽管多任务学习的优点显著,但这种技术可能会面临每个任务的损失平衡问题,导致潜在的性能下降。我们提出了一种计算和参数共享框架,以平衡效率和准确性,利用单个任务训练的单个Transformer进行多个视觉任务。我们的方法和 transfer learning 方法有所启发,以减少计算和参数存储成本,同时保持所需的性能。我们的方法将任务拆分为基任务和其他子任务,并在基任务和子任务之间共享大部分激活和参数/权重,以减少跨任务重复性和增强知识共享。在NYU-D-v2和PASCAL-context数据集上的评估表明,我们的方法和最先进的Transformer-based多任务学习技术相比,具有更高的准确性,减少计算资源。此外,我们的方法还可以扩展到视频输入,通过高效地跨时间域和任务域共享信息,进一步减少计算成本。我们的代码和模型将公开可用。
https://arxiv.org/abs/2303.09663
Remote Sensing Change Detection (RS-CD) aims to detect relevant changes from Multi-Temporal Remote Sensing Images (MT-RSIs), which aids in various RS applications such as land cover, land use, human development analysis, and disaster response. The performance of existing RS-CD methods is attributed to training on large annotated datasets. Furthermore, most of these models are less transferable in the sense that the trained model often performs very poorly when there is a domain gap between training and test datasets. This paper proposes an unsupervised CD method based on deep metric learning that can deal with both of these issues. Given an MT-RSI, the proposed method generates corresponding change probability map by iteratively optimizing an unsupervised CD loss without training it on a large dataset. Our unsupervised CD method consists of two interconnected deep networks, namely Deep-Change Probability Generator (D-CPG) and Deep-Feature Extractor (D-FE). The D-CPG is designed to predict change and no change probability maps for a given MT-RSI, while D-FE is used to extract deep features of MT-RSI that will be further used in the proposed unsupervised CD loss. We use transfer learning capability to initialize the parameters of D-FE. We iteratively optimize the parameters of D-CPG and D-FE for a given MT-RSI by minimizing the proposed unsupervised ``similarity-dissimilarity loss''. This loss is motivated by the principle of metric learning where we simultaneously maximize the distance between change pair-wise pixels while minimizing the distance between no-change pair-wise pixels in bi-temporal image domain and their deep feature domain. The experiments conducted on three CD datasets show that our unsupervised CD method achieves significant improvements over the state-of-the-art supervised and unsupervised CD methods. Code available at this https URL
遥感变化检测(RS-CD)旨在从多时间遥感图像(MT-RSIs)中检测相关变化,帮助各种遥感应用,例如植被覆盖、土地使用、人类发展分析、灾难响应。现有RS-CD方法的性能归咎于大规模标注数据的训练。此外,这些模型通常不太可转移,因为训练模型通常在训练和测试数据集之间存在领域差距时表现很差。本文提出了基于深度度量学习的无监督CD方法,可以处理这两个问题。给定一个MT-RSI,该方法通过迭代优化无监督CD损失而不需要在大规模数据集上训练它。我们的无监督CD方法由两个相互连接的深度网络组成,分别是深度变化概率生成(D-CPG)和深度特征提取(D-FE)。D-CPG旨在预测给定MT-RSI的变化和不变概率地图,而D-FE用于提取MT-RSI的深度特征,这些特征将用于提议的无监督CD损失。我们使用转移学习能力初始化D-FE参数。我们迭代优化D-CPG和D-FE对给定MT-RSI的参数,通过最小化提议的无监督的“相似度差异损失”。该损失是基于度量学习原则而提出的,我们在双时间图像域和其深度特征域中的最大变化像素之间的相对距离最小化,同时最小化不变像素之间的相对距离。在三个CD数据集上进行的实验表明,我们的无监督CD方法在监督和无监督CD方法之上实现了显著的改进。代码可在这个httpsURL上获取。
https://arxiv.org/abs/2303.09536