Paper Reading AI Learner

DL-KDD: Dual-Light Knowledge Distillation for Action Recognition in the Dark

2024-06-04 16:38:06
Chi-Jui Chang, Oscar Tai-Yuan Chen, Vincent S. Tseng

Abstract

Human action recognition in dark videos is a challenging task for computer vision. Recent research focuses on applying dark enhancement methods to improve the visibility of the video. However, such video processing results in the loss of critical information in the original (un-enhanced) video. Conversely, traditional two-stream methods are capable of learning information from both original and processed videos, but it can lead to a significant increase in the computational cost during the inference phase in the task of video classification. To address these challenges, we propose a novel teacher-student video classification framework, named Dual-Light KnowleDge Distillation for Action Recognition in the Dark (DL-KDD). This framework enables the model to learn from both original and enhanced video without introducing additional computational cost during inference. Specifically, DL-KDD utilizes the strategy of knowledge distillation during training. The teacher model is trained with enhanced video, and the student model is trained with both the original video and the soft target generated by the teacher model. This teacher-student framework allows the student model to predict action using only the original input video during inference. In our experiments, the proposed DL-KDD framework outperforms state-of-the-art methods on the ARID, ARID V1.5, and Dark-48 datasets. We achieve the best performance on each dataset and up to a 4.18% improvement on Dark-48, using only original video inputs, thus avoiding the use of two-stream framework or enhancement modules for inference. We further validate the effectiveness of the distillation strategy in ablative experiments. The results highlight the advantages of our knowledge distillation framework in dark human action recognition.

Abstract (translated)

人类动作识别在黑暗视频中是一项具有挑战性的计算机视觉任务。最近的研究专注于将黑暗增强方法应用于提高视频的可视化度。然而,这种视频处理会导致原始(未增强)视频中关键信息的丢失。相反,传统的两流方法可以从原始和处理视频中学到信息,但在推理阶段会显著增加计算成本。为了应对这些挑战,我们提出了一个名为Dual-Light KnowleDge Distillation for Action Recognition in the Dark (DL-KDD)的新教师-学生视频分类框架。这个框架在推理过程中不引入额外的计算成本。具体来说,DL-KDD利用知识蒸馏策略进行训练。教师模型通过增强视频进行训练,学生模型通过原始视频和教师模型生成的软目标进行训练。这个教师-学生框架使得学生模型在推理过程中仅使用原始输入视频预测动作。在实验中,我们发现,与最先进的Methods相比,DL-KDD框架在ARID、ARID V1.5和Dark-48数据集上的性能都表现出色。我们在每个数据集上都实现了最佳性能,而在Dark-48数据集上实现了4.18%的提高,而使用 only the original video inputs。我们还进一步验证了知识蒸馏策略在实验中的有效性。结果强调了我们在黑暗中人类动作识别中使用知识蒸馏框架的优势。

URL

https://arxiv.org/abs/2406.02468

PDF

https://arxiv.org/pdf/2406.02468.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot