Paper Reading AI Learner

Resolution-Agnostic Neural Compression for High-Fidelity Portrait Video Conferencing via Implicit Radiance Fields

2024-02-26 14:29:13
Yifei Li, Xiaohong Liu, Yicong Peng, Guangtao Zhai, Jun Zhou

Abstract

Video conferencing has caught much more attention recently. High fidelity and low bandwidth are two major objectives of video compression for video conferencing applications. Most pioneering methods rely on classic video compression codec without high-level feature embedding and thus can not reach the extremely low bandwidth. Recent works instead employ model-based neural compression to acquire ultra-low bitrates using sparse representations of each frame such as facial landmark information, while these approaches can not maintain high fidelity due to 2D image-based warping. In this paper, we propose a novel low bandwidth neural compression approach for high-fidelity portrait video conferencing using implicit radiance fields to achieve both major objectives. We leverage dynamic neural radiance fields to reconstruct high-fidelity talking head with expression features, which are represented as frame substitution for transmission. The overall system employs deep model to encode expression features at the sender and reconstruct portrait at the receiver with volume rendering as decoder for ultra-low bandwidth. In particular, with the characteristic of neural radiance fields based model, our compression approach is resolution-agnostic, which means that the low bandwidth achieved by our approach is independent of video resolution, while maintaining fidelity for higher resolution reconstruction. Experimental results demonstrate that our novel framework can (1) construct ultra-low bandwidth video conferencing, (2) maintain high fidelity portrait and (3) have better performance on high-resolution video compression than previous works.

Abstract (translated)

视频会议最近引起了更多关注。高清晰度和低带宽是视频压缩为视频会议应用的主要目标。大多数先驱方法依赖于经典视频压缩编码器,没有高层次的特征嵌入,因此无法达到极低的带宽。最近的工作则采用基于模型的神经压缩方法,利用每个帧的稀疏表示(如面部关键点信息)来获得超低带宽,而這些方法无法保持高清晰度,因为它们基于二维图像的变形。在本文中,我们提出了一个用于高清晰度人像视频会议的新型低带宽神经压缩方法,利用隐式辐射场实现这两个主要目标。我们利用动态神经辐射场重构高清晰度谈话头,表达特征用帧置换表示。整个系统采用深度模型对发送方的表达特征进行编码,使用体积渲染作为解码器来重构接收端的肖像,以实现超低带宽。特别地,基于神经辐射场模型的特点,我们的压缩方法对分辨率无关,这意味着我们方法的低带宽与视频分辨率无关,同时保持高清晰度的重建。实验结果表明,我们的新框架可以(1)构建超低带宽的视频会议,(2)保持高清晰度的人像,(3)在视频压缩方面的性能比以前的工作更好。

URL

https://arxiv.org/abs/2402.16599

PDF

https://arxiv.org/pdf/2402.16599.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot