Paper Reading AI Learner

Source-free Domain Adaptation for Video Object Detection Under Adverse Image Conditions

2024-04-23 17:39:06
Xingguang Zhang, Chih-Hsien Chou

Abstract

When deploying pre-trained video object detectors in real-world scenarios, the domain gap between training and testing data caused by adverse image conditions often leads to performance degradation. Addressing this issue becomes particularly challenging when only the pre-trained model and degraded videos are available. Although various source-free domain adaptation (SFDA) methods have been proposed for single-frame object detectors, SFDA for video object detection (VOD) remains unexplored. Moreover, most unsupervised domain adaptation works for object detection rely on two-stage detectors, while SFDA for one-stage detectors, which are more vulnerable to fine-tuning, is not well addressed in the literature. In this paper, we propose Spatial-Temporal Alternate Refinement with Mean Teacher (STAR-MT), a simple yet effective SFDA method for VOD. Specifically, we aim to improve the performance of the one-stage VOD method, YOLOV, under adverse image conditions, including noise, air turbulence, and haze. Extensive experiments on the ImageNetVOD dataset and its degraded versions demonstrate that our method consistently improves video object detection performance in challenging imaging conditions, showcasing its potential for real-world applications.

Abstract (translated)

在将预训练的视频物体检测器应用于现实场景时,训练和测试数据之间的领域差异会导致性能下降。当只有预训练模型和降级视频可用时,解决此问题变得尤为具有挑战性。尽管已经提出了各种源域免费域适应(SFDA)方法用于单帧物体检测器,但视频物体检测器(VOD)的SFDA仍然没有被探索。此外,大多数无监督域适应方法,这些方法在物体检测中依赖于两阶段检测器,而我们的SFDA方法针对一阶段检测器,这些检测器更容易受到微调的影响,在文献中没有得到很好的解决。在本文中,我们提出了Spatial-Temporal Alternate Refinement with Mean Teacher (STAR-MT),一种简单而有效的SFDA方法用于VOD。具体来说,我们旨在改善在恶劣图像条件下,包括噪声、气流和雾的,预训练的VOD方法YOLOV的性能。对ImageNetVOD数据集及其降本版本的大规模实验证明,我们的方法在具有挑战性的图像条件下持续改善视频物体检测器的性能,表明其具有在现实应用中的潜在。

URL

https://arxiv.org/abs/2404.15252

PDF

https://arxiv.org/pdf/2404.15252.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot