Abstract
Recently, self-training and active learning have been proposed to alleviate this problem. Self-training can improve model accuracy with massive unlabeled data, but some pseudo labels containing noise would be generated with limited or imbalanced training data. And there will be suboptimal models if human guidance is absent. Active learning can select more effective data to intervene, while the model accuracy can not be improved because the massive unlabeled data are not used. And the probability of querying sub-optimal samples will increase when the domain difference is too large, increasing annotation cost. This paper proposes an iterative loop learning method combining Self-Training and Active Learning (STAL) for domain adaptive semantic segmentation. The method first uses self-training to learn massive unlabeled data to improve model accuracy and provide more accurate selection models for active learning. Secondly, combined with the sample selection strategy of active learning, manual intervention is used to correct the self-training learning. Iterative loop to achieve the best performance with minimal label cost. Extensive experiments show that our method establishes state-of-the-art performance on tasks of GTAV to Cityscapes, SYNTHIA to Cityscapes, improving by 4.9% mIoU and 5.2% mIoU, compared to the previous best method, respectively. Code will be available.
Abstract (translated)
最近,自我训练和主动学习被提出以减轻这个问题。自我训练可以通过大量未标记的数据提高模型准确性,但一些包含噪声的伪标签会在缺乏或不平衡的训练数据中产生。如果缺少人类指导,则会出现劣等模型。主动学习可以选择更有效的数据进行干预,而模型准确性无法通过不使用大量未标记数据而提高。当域差异过大时,询问劣等样本的概率会增加,会增加标注成本。本文提出了一种迭代循环学习方法,结合自我训练和主动学习(STAL)用于域自适应语义分割。方法首先使用自我训练学习大量未标记数据以提高模型准确性,并为主动学习提供更准确的选择模型。其次,与主动学习的样本选择策略相结合,手动干预用于纠正自我训练学习。迭代循环以最小化标签成本实现最佳性能。广泛的实验表明,我们的方法在GTAV和城市景观的任务中建立了最先进的性能,比先前的最佳方法提高了4.9%和5.2%。代码将可用。
URL
https://arxiv.org/abs/2301.13361