Paper Reading AI Learner

Learning a Single Tucker Decomposition Network for Lossy Image Compression with Multiple Bits-Per-Pixel Rates

2018-07-10 03:40:36
Jianrui Cai, Zisheng Cao, Lei Zhang

Abstract

Lossy image compression (LIC), which aims to utilize inexact approximations to represent an image more compactly, is a classical problem in image processing. Recently, deep convolutional neural networks (CNNs) have achieved interesting results in LIC by learning an encoder-quantizer-decoder network from a large amount of data. However, existing CNN-based LIC methods usually can only train a network for a specific bits-per-pixel (bpp). Such a "one network per bpp" problem limits the generality and flexibility of CNNs to practical LIC applications. In this paper, we propose to learn a single CNN which can perform LIC at multiple bpp rates. A simple yet effective Tucker Decomposition Network (TDNet) is developed, where there is a novel tucker decomposition layer (TDL) to decompose a latent image representation into a set of projection matrices and a core tensor. By changing the rank of the core tensor and its quantization, we can easily adjust the bpp rate of latent image representation within a single CNN. Furthermore, an iterative non-uniform quantization scheme is presented to optimize the quantizer, and a coarse-to-fine training strategy is introduced to reconstruct the decompressed images. Extensive experiments demonstrate the state-of-the-art compression performance of TDNet in terms of both PSNR and MS-SSIM indices.

Abstract (translated)

有损图像压缩(LIC)旨在利用不精确的近似来更紧凑地表示图像,这是图像处理中的经典问题。最近,深度卷积神经网络(CNN)通过从大量数据中学习编码器 - 量化器 - 解码器网络在LIC中获得了有趣的结果。然而,现有的基于CNN的LIC方法通常只能针对特定的每像素比特(bpp)训练网络。这种“每个bpp一个网络”问题限制了CNN对实际LIC应用的普遍性和灵活性。在本文中,我们建议学习一个能够以多个bpp速率执行LIC的CNN。开发了一种简单而有效的Tucker分解网络(TDNet),其中存在一种新颖的tucker分解层(TDL),用于将潜像表示分解为一组投影矩阵和核心张量。通过改变核心张量的等级及其量化,我们可以容易地调整单个CNN内的潜像表示的bpp率。此外,呈现迭代非均匀量化方案以优化量化器,并且引入粗略到精细训练策略以重建解压缩图像。大量实验证明了TDNet在PSNR和MS-SSIM指数方面的最先进压缩性能。

URL

https://arxiv.org/abs/1807.03470

PDF

https://arxiv.org/pdf/1807.03470.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot