Abstract
Human action recognition in dark videos is a challenging task for computer vision. Recent research focuses on applying dark enhancement methods to improve the visibility of the video. However, such video processing results in the loss of critical information in the original (un-enhanced) video. Conversely, traditional two-stream methods are capable of learning information from both original and processed videos, but it can lead to a significant increase in the computational cost during the inference phase in the task of video classification. To address these challenges, we propose a novel teacher-student video classification framework, named Dual-Light KnowleDge Distillation for Action Recognition in the Dark (DL-KDD). This framework enables the model to learn from both original and enhanced video without introducing additional computational cost during inference. Specifically, DL-KDD utilizes the strategy of knowledge distillation during training. The teacher model is trained with enhanced video, and the student model is trained with both the original video and the soft target generated by the teacher model. This teacher-student framework allows the student model to predict action using only the original input video during inference. In our experiments, the proposed DL-KDD framework outperforms state-of-the-art methods on the ARID, ARID V1.5, and Dark-48 datasets. We achieve the best performance on each dataset and up to a 4.18% improvement on Dark-48, using only original video inputs, thus avoiding the use of two-stream framework or enhancement modules for inference. We further validate the effectiveness of the distillation strategy in ablative experiments. The results highlight the advantages of our knowledge distillation framework in dark human action recognition.
Abstract (translated)
人类动作识别在黑暗视频中是一项具有挑战性的计算机视觉任务。最近的研究专注于将黑暗增强方法应用于提高视频的可视化度。然而,这种视频处理会导致原始(未增强)视频中关键信息的丢失。相反,传统的两流方法可以从原始和处理视频中学到信息,但在推理阶段会显著增加计算成本。为了应对这些挑战,我们提出了一个名为Dual-Light KnowleDge Distillation for Action Recognition in the Dark (DL-KDD)的新教师-学生视频分类框架。这个框架在推理过程中不引入额外的计算成本。具体来说,DL-KDD利用知识蒸馏策略进行训练。教师模型通过增强视频进行训练,学生模型通过原始视频和教师模型生成的软目标进行训练。这个教师-学生框架使得学生模型在推理过程中仅使用原始输入视频预测动作。在实验中,我们发现,与最先进的Methods相比,DL-KDD框架在ARID、ARID V1.5和Dark-48数据集上的性能都表现出色。我们在每个数据集上都实现了最佳性能,而在Dark-48数据集上实现了4.18%的提高,而使用 only the original video inputs。我们还进一步验证了知识蒸馏策略在实验中的有效性。结果强调了我们在黑暗中人类动作识别中使用知识蒸馏框架的优势。
URL
https://arxiv.org/abs/2406.02468