Paper Reading AI Learner

LDMIC: Learning-based Distributed Multi-view Image Coding

2023-01-24 03:47:37
Xinjie Zhang, Jiawei Shao, Jun Zhang

Abstract

Multi-view image compression plays a critical role in 3D-related applications. Existing methods adopt a predictive coding architecture, which requires joint encoding to compress the corresponding disparity as well as residual information. This demands collaboration among cameras and enforces the epipolar geometric constraint between different views, which makes it challenging to deploy these methods in distributed camera systems with randomly overlapping fields of view. Meanwhile, distributed source coding theory indicates that efficient data compression of correlated sources can be achieved by independent encoding and joint decoding, which motivates us to design a learning-based distributed multi-view image coding (LDMIC) framework. With independent encoders, LDMIC introduces a simple yet effective joint context transfer module based on the cross-attention mechanism at the decoder to effectively capture the global inter-view correlations, which is insensitive to the geometric relationships between images. Experimental results show that LDMIC significantly outperforms both traditional and learning-based MIC methods while enjoying fast encoding speed. Code will be released at this https URL.

Abstract (translated)

在三维相关应用中,多视图图像压缩扮演着关键的角色。现有的方法采用了预测编码架构,需要进行联合编码来压缩相应的差距和残留信息。这需要相机之间的协作并强制不同视图之间的极向量几何约束,这使得在分布式相机系统中随机重叠的视角范围内部署这些方法具有挑战性。同时,分布式源编码理论表明,通过独立的编码和解码,可以有效地压缩相关源的数据,这激励我们设计基于学习的动态分布式多视图图像编码(LDMIC)框架。通过独立的编码器,LDMIC引入了一个简单但有效的联合上下文转移模块,基于解码器的交叉注意力机制,有效地捕捉全局视角间的关系,这种关系对图像之间的几何关系不敏感。实验结果表明,LDMIC在传统的和经验的MIC方法之上显著超越了它们,同时享受快速的编码速度。代码将在本网站的 https URL 发布。

URL

https://arxiv.org/abs/2301.09799

PDF

https://arxiv.org/pdf/2301.09799.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot