Paper Reading AI Learner

HybridFlow: Infusing Continuity into Masked Codebook for Extreme Low-Bitrate Image Compression

2024-04-20 13:19:08
Lei Lu, Yanyue Xie, Wei Jiang, Wei Wang, Xue Lin, Yanzhi Wang

Abstract

This paper investigates the challenging problem of learned image compression (LIC) with extreme low bitrates. Previous LIC methods based on transmitting quantized continuous features often yield blurry and noisy reconstruction due to the severe quantization loss. While previous LIC methods based on learned codebooks that discretize visual space usually give poor-fidelity reconstruction due to the insufficient representation power of limited codewords in capturing faithful details. We propose a novel dual-stream framework, HyrbidFlow, which combines the continuous-feature-based and codebook-based streams to achieve both high perceptual quality and high fidelity under extreme low bitrates. The codebook-based stream benefits from the high-quality learned codebook priors to provide high quality and clarity in reconstructed images. The continuous feature stream targets at maintaining fidelity details. To achieve the ultra low bitrate, a masked token-based transformer is further proposed, where we only transmit a masked portion of codeword indices and recover the missing indices through token generation guided by information from the continuous feature stream. We also develop a bridging correction network to merge the two streams in pixel decoding for final image reconstruction, where the continuous stream features rectify biases of the codebook-based pixel decoder to impose reconstructed fidelity details. Experimental results demonstrate superior performance across several datasets under extremely low bitrates, compared with existing single-stream codebook-based or continuous-feature-based LIC methods.

Abstract (translated)

本文研究了在低比特率下进行学习图像压缩(LIC)的具有挑战性的问题。以前基于传输量化连续特征的LIC方法通常由于量化损失严重而导致模糊和噪声的重建。而以前基于学习码本的LIC方法在捕捉准确细节方面具有不足的表示能力,因此通常会导致低质量的重建。我们提出了一个新型的双流框架HyrbidFlow,它结合了基于连续特征和基于码本的流,以实现低比特率下的高感知质量和高保真度。基于码本的流利用高质量的学习码本先验来提供高质量和清晰度在重构图像中。连续特征流的目标是保持保真度细节。为了实现超低比特率,我们进一步提出了一个掩码标记的Transformer,其中我们仅传输码字索引的掩码部分,并通过连续特征流的标记来恢复缺失的索引。我们还开发了一个平滑修复网络,用于在像素解码中合并这两个流,以便进行最终图像重构。基于连续流特征的码字解码器的偏置被平滑修复网络中的连续流纠正。实验结果表明,在极低比特率下,与现有的单流码本或连续特征 based LIC 方法相比,具有卓越的性能。

URL

https://arxiv.org/abs/2404.13372

PDF

https://arxiv.org/pdf/2404.13372.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot