Abstract
In this paper, we address the challenging source-free unsupervised domain adaptation (SFUDA) for pinhole-to-panoramic semantic segmentation, given only a pinhole image pre-trained model (i.e., source) and unlabeled panoramic images (i.e., target). Tackling this problem is non-trivial due to three critical challenges: 1) semantic mismatches from the distinct Field-of-View (FoV) between domains, 2) style discrepancies inherent in the UDA problem, and 3) inevitable distortion of the panoramic images. To tackle these problems, we propose 360SFUDA++ that effectively extracts knowledge from the source pinhole model with only unlabeled panoramic images and transfers the reliable knowledge to the target panoramic domain. Specifically, we first utilize Tangent Projection (TP) as it has less distortion and meanwhile slits the equirectangular projection (ERP) to patches with fixed FoV projection (FFP) to mimic the pinhole images. Both projections are shown effective in extracting knowledge from the source model. However, as the distinct projections make it less possible to directly transfer knowledge between domains, we then propose Reliable Panoramic Prototype Adaptation Module (RP2AM) to transfer knowledge at both prediction and prototype levels. RP$^2$AM selects the confident knowledge and integrates panoramic prototypes for reliable knowledge adaptation. Moreover, we introduce Cross-projection Dual Attention Module (CDAM), which better aligns the spatial and channel characteristics across projections at the feature level between domains. Both knowledge extraction and transfer processes are synchronously updated to reach the best performance. Extensive experiments on the synthetic and real-world benchmarks, including outdoor and indoor scenarios, demonstrate that our 360SFUDA++ achieves significantly better performance than prior SFUDA methods.
Abstract (translated)
在本文中,我们解决了仅使用预训练的孔洞图(source)和未标注的全景图像(target)进行无监督域适应(SFUDA)的问题,以实现孔洞到全景语义分割。解决这一问题是不简单的,因为存在三个关键挑战:1)不同域之间语义不匹配,2)源域问题中的风格差异,3)全景图像中不可避免的扭曲。为了应对这些问题,我们提出了360SFUDA++,它有效地从仅有的未标注全景图像中提取知识,并将可靠的知识传递到目标全景域。具体来说,我们首先利用切线投影(TP)作为它具有较少的扭曲,同时将等角投影(ERP)切成固定 FoV 投影(FFP)的补丁,以模仿孔洞图像。两个投影在提取知识方面都有效。然而,由于不同的投影使得域之间知识传递变得困难,我们 then 引入了可靠的全景原型适应模块(RP2AM),在预测和原型级别上传递知识。RP2AM 选择自信的知识,并整合全景原型以实现可靠的知识适应。此外,我们还引入了跨投影双重注意模块(CDAM),它更好地对域之间的特征水平进行投影之间的空间和通道特征的同步调整。知识提取和传递过程都被同步更新,以达到最佳性能。在合成和真实世界基准上的广泛实验,包括户外和室内场景,证明了我们的360SFUDA++在性能上显著优于前面的SFUDA方法。
URL
https://arxiv.org/abs/2404.16501