Paper Reading AI Learner

Multi-resolution Rescored ByteTrack for Video Object Detection on Ultra-low-power Embedded Systems

2024-04-17 15:45:49
Luca Bompani, Manuele Rusci, Daniele Palossi, Francesco Conti, Luca Benini

Abstract

This paper introduces Multi-Resolution Rescored Byte-Track (MR2-ByteTrack), a novel video object detection framework for ultra-low-power embedded processors. This method reduces the average compute load of an off-the-shelf Deep Neural Network (DNN) based object detector by up to 2.25$\times$ by alternating the processing of high-resolution images (320$\times$320 pixels) with multiple down-sized frames (192$\times$192 pixels). To tackle the accuracy degradation due to the reduced image input size, MR2-ByteTrack correlates the output detections over time using the ByteTrack tracker and corrects potential misclassification using a novel probabilistic Rescore algorithm. By interleaving two down-sized images for every high-resolution one as the input of different state-of-the-art DNN object detectors with our MR2-ByteTrack, we demonstrate an average accuracy increase of 2.16% and a latency reduction of 43% on the GAP9 microcontroller compared to a baseline frame-by-frame inference scheme using exclusively full-resolution images. Code available at: this https URL

Abstract (translated)

本文介绍了一种名为Multi-Resolution Rescored Byte-Track (MR2-ByteTrack)的新视频对象检测框架,用于低功耗嵌入式处理器。该方法通过交替处理高分辨率图像(320×320像素)和多个低分辨率帧(192×192像素),将基于深度神经网络(DNN)的定制对象检测器的平均计算负载降低至2.25倍。为了应对由于输入图像尺寸减少而导致的准确度下降,MR2-ByteTrack通过ByteTrack跟踪器在时间上相关联输出检测结果,并使用一种新颖的概率Rescore算法纠正潜在的误分类。通过将两个低分辨率图像作为每个高分辨率图像的输入,我们将MR2-ByteTrack应用于具有不同先进状态的DNN物体检测器,在GAP9微控制器上,与仅使用全分辨率图像的基准帧间推理方案相比,我们证明了平均准确度增加2.16%和延迟降低43%。代码可在此处下载:https://this URL。

URL

https://arxiv.org/abs/2404.11488

PDF

https://arxiv.org/pdf/2404.11488.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot