Resolution-Agnostic Neural Compression for High-Fidelity Portrait Video Conferencing via Implicit Radiance Fields

Abstract
Abstract (translated)
URL
PDF

Abstract

Video conferencing has caught much more attention recently. High fidelity and low bandwidth are two major objectives of video compression for video conferencing applications. Most pioneering methods rely on classic video compression codec without high-level feature embedding and thus can not reach the extremely low bandwidth. Recent works instead employ model-based neural compression to acquire ultra-low bitrates using sparse representations of each frame such as facial landmark information, while these approaches can not maintain high fidelity due to 2D image-based warping. In this paper, we propose a novel low bandwidth neural compression approach for high-fidelity portrait video conferencing using implicit radiance fields to achieve both major objectives. We leverage dynamic neural radiance fields to reconstruct high-fidelity talking head with expression features, which are represented as frame substitution for transmission. The overall system employs deep model to encode expression features at the sender and reconstruct portrait at the receiver with volume rendering as decoder for ultra-low bandwidth. In particular, with the characteristic of neural radiance fields based model, our compression approach is resolution-agnostic, which means that the low bandwidth achieved by our approach is independent of video resolution, while maintaining fidelity for higher resolution reconstruction. Experimental results demonstrate that our novel framework can (1) construct ultra-low bandwidth video conferencing, (2) maintain high fidelity portrait and (3) have better performance on high-resolution video compression than previous works.

Abstract (translated)

视频会议最近引起了更多关注。高清晰度和低带宽是视频压缩为视频会议应用的主要目标。大多数先驱方法依赖于经典视频压缩编码器，没有高层次的特征嵌入，因此无法达到极低的带宽。最近的工作则采用基于模型的神经压缩方法，利用每个帧的稀疏表示（如面部关键点信息）来获得超低带宽，而這些方法无法保持高清晰度，因为它们基于二维图像的变形。在本文中，我们提出了一个用于高清晰度人像视频会议的新型低带宽神经压缩方法，利用隐式辐射场实现这两个主要目标。我们利用动态神经辐射场重构高清晰度谈话头，表达特征用帧置换表示。整个系统采用深度模型对发送方的表达特征进行编码，使用体积渲染作为解码器来重构接收端的肖像，以实现超低带宽。特别地，基于神经辐射场模型的特点，我们的压缩方法对分辨率无关，这意味着我们方法的低带宽与视频分辨率无关，同时保持高清晰度的重建。实验结果表明，我们的新框架可以（1）构建超低带宽的视频会议，（2）保持高清晰度的人像，（3）在视频压缩方面的性能比以前的工作更好。

URL

https://arxiv.org/abs/2402.16599

PDF

https://arxiv.org/pdf/2402.16599.pdf

Resolution-Agnostic Neural Compression for High-Fidelity Portrait Video Conferencing via Implicit Radiance Fields

Abstract

Abstract (translated)

URL

PDF Copy

PDF