Paper Reading AI Learner

Source-Free Domain Adaptation Guided by Vision and Vision-Language Pre-Training

2024-05-05 14:48:13
Wenyu Zhang, Li Shen, Chuan-Sheng Foo

Abstract

Source-free domain adaptation (SFDA) aims to adapt a source model trained on a fully-labeled source domain to a related but unlabeled target domain. While the source model is a key avenue for acquiring target pseudolabels, the generated pseudolabels may exhibit source bias. In the conventional SFDA pipeline, a large data (e.g. ImageNet) pre-trained feature extractor is used to initialize the source model at the start of source training, and subsequently discarded. Despite having diverse features important for generalization, the pre-trained feature extractor can overfit to the source data distribution during source training and forget relevant target domain knowledge. Rather than discarding this valuable knowledge, we introduce an integrated framework to incorporate pre-trained networks into the target adaptation process. The proposed framework is flexible and allows us to plug modern pre-trained networks into the adaptation process to leverage their stronger representation learning capabilities. For adaptation, we propose the Co-learn algorithm to improve target pseudolabel quality collaboratively through the source model and a pre-trained feature extractor. Building on the recent success of the vision-language model CLIP in zero-shot image recognition, we present an extension Co-learn++ to further incorporate CLIP's zero-shot classification decisions. We evaluate on 3 benchmark datasets and include more challenging scenarios such as open-set, partial-set and open-partial SFDA. Experimental results demonstrate that our proposed strategy improves adaptation performance and can be successfully integrated with existing SFDA methods.

Abstract (translated)

源无标签域适应(SFDA)旨在将一个在全面标注的源域上训练的源模型适应到相关但未标注的目标域。尽管源模型是获取目标伪标签的关键途径,但生成的伪标签可能表现出源偏见。在传统的SFDA管道中,用于初始化源模型的预训练特征提取器在源训练开始时使用,随后被抛弃。尽管预训练特征提取器具有多样化的特征,但对于通用,预训练的提取器在源训练过程中可能会过拟合到源数据分布,并忘记相关目标领域知识。为了利用预训练网络的更强的表示学习能力,我们将预训练网络集成到目标适应过程中。所提出的框架灵活,允许我们将现代预训练网络插入到适应过程中,以充分利用其更强的表示学习能力。为了适应,我们提出了合作学习(Co-learn)算法,通过源模型和预训练特征提取器共同改善目标伪标签的质量。我们在最近零散图像识别成功的基础上,引入了Co-learn++扩展,进一步将CLIP的零散分类决策集成到适应过程中。我们在三个基准数据集上进行评估,并包括更具有挑战性的场景,如开集、部分集和开部分集SFDA。实验结果表明,我们提出的策略提高了适应性能,可以与现有的SFDA方法成功集成。

URL

https://arxiv.org/abs/2405.02954

PDF

https://arxiv.org/pdf/2405.02954.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot