Abstract
When deploying pre-trained video object detectors in real-world scenarios, the domain gap between training and testing data caused by adverse image conditions often leads to performance degradation. Addressing this issue becomes particularly challenging when only the pre-trained model and degraded videos are available. Although various source-free domain adaptation (SFDA) methods have been proposed for single-frame object detectors, SFDA for video object detection (VOD) remains unexplored. Moreover, most unsupervised domain adaptation works for object detection rely on two-stage detectors, while SFDA for one-stage detectors, which are more vulnerable to fine-tuning, is not well addressed in the literature. In this paper, we propose Spatial-Temporal Alternate Refinement with Mean Teacher (STAR-MT), a simple yet effective SFDA method for VOD. Specifically, we aim to improve the performance of the one-stage VOD method, YOLOV, under adverse image conditions, including noise, air turbulence, and haze. Extensive experiments on the ImageNetVOD dataset and its degraded versions demonstrate that our method consistently improves video object detection performance in challenging imaging conditions, showcasing its potential for real-world applications.
Abstract (translated)
在将预训练的视频物体检测器应用于现实场景时,训练和测试数据之间的领域差异会导致性能下降。当只有预训练模型和降级视频可用时,解决此问题变得尤为具有挑战性。尽管已经提出了各种源域免费域适应(SFDA)方法用于单帧物体检测器,但视频物体检测器(VOD)的SFDA仍然没有被探索。此外,大多数无监督域适应方法,这些方法在物体检测中依赖于两阶段检测器,而我们的SFDA方法针对一阶段检测器,这些检测器更容易受到微调的影响,在文献中没有得到很好的解决。在本文中,我们提出了Spatial-Temporal Alternate Refinement with Mean Teacher (STAR-MT),一种简单而有效的SFDA方法用于VOD。具体来说,我们旨在改善在恶劣图像条件下,包括噪声、气流和雾的,预训练的VOD方法YOLOV的性能。对ImageNetVOD数据集及其降本版本的大规模实验证明,我们的方法在具有挑战性的图像条件下持续改善视频物体检测器的性能,表明其具有在现实应用中的潜在。
URL
https://arxiv.org/abs/2404.15252