Source-Free Domain Adaptation Guided by Vision and Vision-Language Pre-Training

Abstract
Abstract (translated)
URL
PDF

Abstract

Source-free domain adaptation (SFDA) aims to adapt a source model trained on a fully-labeled source domain to a related but unlabeled target domain. While the source model is a key avenue for acquiring target pseudolabels, the generated pseudolabels may exhibit source bias. In the conventional SFDA pipeline, a large data (e.g. ImageNet) pre-trained feature extractor is used to initialize the source model at the start of source training, and subsequently discarded. Despite having diverse features important for generalization, the pre-trained feature extractor can overfit to the source data distribution during source training and forget relevant target domain knowledge. Rather than discarding this valuable knowledge, we introduce an integrated framework to incorporate pre-trained networks into the target adaptation process. The proposed framework is flexible and allows us to plug modern pre-trained networks into the adaptation process to leverage their stronger representation learning capabilities. For adaptation, we propose the Co-learn algorithm to improve target pseudolabel quality collaboratively through the source model and a pre-trained feature extractor. Building on the recent success of the vision-language model CLIP in zero-shot image recognition, we present an extension Co-learn++ to further incorporate CLIP's zero-shot classification decisions. We evaluate on 3 benchmark datasets and include more challenging scenarios such as open-set, partial-set and open-partial SFDA. Experimental results demonstrate that our proposed strategy improves adaptation performance and can be successfully integrated with existing SFDA methods.

Abstract (translated)

源无标签域适应（SFDA）旨在将一个在全面标注的源域上训练的源模型适应到相关但未标注的目标域。尽管源模型是获取目标伪标签的关键途径，但生成的伪标签可能表现出源偏见。在传统的SFDA管道中，用于初始化源模型的预训练特征提取器在源训练开始时使用，随后被抛弃。尽管预训练特征提取器具有多样化的特征，但对于通用，预训练的提取器在源训练过程中可能会过拟合到源数据分布，并忘记相关目标领域知识。为了利用预训练网络的更强的表示学习能力，我们将预训练网络集成到目标适应过程中。所提出的框架灵活，允许我们将现代预训练网络插入到适应过程中，以充分利用其更强的表示学习能力。为了适应，我们提出了合作学习（Co-learn）算法，通过源模型和预训练特征提取器共同改善目标伪标签的质量。我们在最近零散图像识别成功的基础上，引入了Co-learn++扩展，进一步将CLIP的零散分类决策集成到适应过程中。我们在三个基准数据集上进行评估，并包括更具有挑战性的场景，如开集、部分集和开部分集SFDA。实验结果表明，我们提出的策略提高了适应性能，可以与现有的SFDA方法成功集成。

URL

https://arxiv.org/abs/2405.02954

PDF

https://arxiv.org/pdf/2405.02954.pdf

Source-Free Domain Adaptation Guided by Vision and Vision-Language Pre-Training

Abstract

Abstract (translated)

URL

PDF Copy

PDF