Paper Reading AI Learner

Treasure What You Have: Exploiting Similarity in Deep Neural Networks for Efficient Video Processing

2023-05-10 23:18:47
Hadjer Benmeziane, Halima Bouzidi, Hamza Ouarnoughi, Ozcan Ozturk, Smail Niar

Abstract

Deep learning has enabled various Internet of Things (IoT) applications. Still, designing models with high accuracy and computational efficiency remains a significant challenge, especially in real-time video processing applications. Such applications exhibit high inter- and intra-frame redundancy, allowing further improvement. This paper proposes a similarity-aware training methodology that exploits data redundancy in video frames for efficient processing. Our approach introduces a per-layer regularization that enhances computation reuse by increasing the similarity of weights during training. We validate our methodology on two critical real-time applications, lane detection and scene parsing. We observe an average compression ratio of approximately 50% and a speedup of \sim 1.5x for different models while maintaining the same accuracy.

Abstract (translated)

深度学习已经使各种物联网(IoT)应用得以实现。然而,设计高精度和高计算效率的模型仍然是一个 significant 挑战,特别是实时视频处理应用中。这些应用表现出内外帧的冗余,从而允许进一步改进。本文提出了一种相似性 aware 的训练方法,利用视频帧中的数据冗余以高效处理。我们的方法引入了每层 Regularization,在训练期间通过增加权重之间的相似性来提高计算重用。我们对两个关键实时应用,车道检测和场景解析进行了验证。我们观察到平均压缩比例约为 50%,不同模型的速度up 达到了 \sim 1.5x,同时保持相同的精度。

URL

https://arxiv.org/abs/2305.06492

PDF

https://arxiv.org/pdf/2305.06492.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot