Abstract
Domain adaptive pose estimation aims to enable deep models trained on source domain (synthesized) datasets produce similar results on the target domain (real-world) datasets. The existing methods have made significant progress by conducting image-level or feature-level alignment. However, only aligning at a single level is not sufficient to fully bridge the domain gap and achieve excellent domain adaptive results. In this paper, we propose a multi-level domain adaptation aproach, which aligns different domains at the image, feature, and pose levels. Specifically, we first utilize image style transer to ensure that images from the source and target domains have a similar distribution. Subsequently, at the feature level, we employ adversarial training to make the features from the source and target domains preserve domain-invariant characeristics as much as possible. Finally, at the pose level, a self-supervised approach is utilized to enable the model to learn diverse knowledge, implicitly addressing the domain gap. Experimental results demonstrate that significant imrovement can be achieved by the proposed multi-level alignment method in pose estimation, which outperforms previous state-of-the-art in human pose by up to 2.4% and animal pose estimation by up to 3.1% for dogs and 1.4% for sheep.
Abstract (translated)
领域自适应姿态估计的目的是使在源域(合成)数据上训练的深度模型在目标域(现实世界)数据上产生类似的结果。现有的方法通过进行图像级别或特征级别对齐取得了显著进展。然而,仅在单个层面对齐是不够的,不能完全弥合领域差异并获得卓越的领域自适应结果。在本文中,我们提出了一个多级领域自适应方法,该方法在图像、特征和姿态级别对齐不同领域。具体来说,我们首先利用图像风格转移来确保源域和目标域的图像具有相似的分布。然后,在特征级别,我们采用对抗训练来使源域和目标域的特征尽可能保持领域无关特征。最后,在姿态级别,采用自监督方法使模型能够学习到多样知识, implicitly addressing the domain gap。实验结果表明,与以前 state-of-the-art 相比,所提出的多级对齐方法在姿态估计方面取得了显著的改进,在人类姿态评估中提高了 2.4%,在动物姿态评估中提高了 3.1%,对于狗的动物姿态评估提高了 1.4%。
URL
https://arxiv.org/abs/2404.14885