Abstract
Multi-view image compression plays a critical role in 3D-related applications. Existing methods adopt a predictive coding architecture, which requires joint encoding to compress the corresponding disparity as well as residual information. This demands collaboration among cameras and enforces the epipolar geometric constraint between different views, which makes it challenging to deploy these methods in distributed camera systems with randomly overlapping fields of view. Meanwhile, distributed source coding theory indicates that efficient data compression of correlated sources can be achieved by independent encoding and joint decoding, which motivates us to design a learning-based distributed multi-view image coding (LDMIC) framework. With independent encoders, LDMIC introduces a simple yet effective joint context transfer module based on the cross-attention mechanism at the decoder to effectively capture the global inter-view correlations, which is insensitive to the geometric relationships between images. Experimental results show that LDMIC significantly outperforms both traditional and learning-based MIC methods while enjoying fast encoding speed. Code will be released at this https URL.
Abstract (translated)
在三维相关应用中,多视图图像压缩扮演着关键的角色。现有的方法采用了预测编码架构,需要进行联合编码来压缩相应的差距和残留信息。这需要相机之间的协作并强制不同视图之间的极向量几何约束,这使得在分布式相机系统中随机重叠的视角范围内部署这些方法具有挑战性。同时,分布式源编码理论表明,通过独立的编码和解码,可以有效地压缩相关源的数据,这激励我们设计基于学习的动态分布式多视图图像编码(LDMIC)框架。通过独立的编码器,LDMIC引入了一个简单但有效的联合上下文转移模块,基于解码器的交叉注意力机制,有效地捕捉全局视角间的关系,这种关系对图像之间的几何关系不敏感。实验结果表明,LDMIC在传统的和经验的MIC方法之上显著超越了它们,同时享受快速的编码速度。代码将在本网站的 https URL 发布。
URL
https://arxiv.org/abs/2301.09799