Paper Reading AI Learner

Fast object detection in compressed JPEG Images

2019-04-16 22:10:53
Benjamin Deguerre, Clément Chatelain, Gilles Gasso

Abstract

Object detection in still images has drawn a lot of attention over past few years, and with the advent of Deep Learning impressive performances have been achieved with numerous industrial applications. Most of these deep learning models rely on RGB images to localize and identify objects in the image. However in some application scenarii, images are compressed either for storage savings or fast transmission. Therefore a time consuming image decompression step is compulsory in order to apply the aforementioned deep models. To alleviate this drawback, we propose a fast deep architecture for object detection in JPEG images, one of the most widespread compression format. We train a neural network to detect objects based on the blockwise DCT (discrete cosine transform) coefficients {issued from} the JPEG compression algorithm. We modify the well-known Single Shot multibox Detector (SSD) by replacing its first layers with one convolutional layer dedicated to process the DCT inputs. Experimental evaluations on PASCAL VOC and industrial dataset comprising images of road traffic surveillance show that the model is about $2\times$ faster than regular SSD with promising detection performances. To the best of our knowledge, this paper is the first to address detection in compressed JPEG images.

Abstract (translated)

近些年来,静止图像中的目标检测引起了人们的广泛关注,随着深度学习的出现,大量的工业应用已经取得了令人印象深刻的效果。这些深度学习模型大多依赖于RGB图像来定位和识别图像中的对象。然而,在某些应用程序场景中,为了节省存储或快速传输,图像被压缩。因此,为了应用上述深度模型,必须执行耗时的图像解压缩步骤。为了克服这一缺点,我们提出了一种快速深入的jpeg图像对象检测体系结构,它是最广泛使用的压缩格式之一。我们训练一个神经网络来检测对象,基于jpeg压缩算法发布的分块DCT(离散余弦变换)系数。我们修改了著名的单点多盒探测器(SSD),将其第一层替换为一个专门处理DCT输入的卷积层。对Pascal VOC和包含道路交通监控图像的工业数据集的实验评估表明,该模型比常规固态硬盘快2倍,具有良好的检测性能。据我们所知,本文是第一个在压缩的jpeg图像中进行地址检测的。

URL

https://arxiv.org/abs/1904.08408

PDF

https://arxiv.org/pdf/1904.08408.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot