Paper Reading AI Learner

JPEG-Inspired Cloud-Edge Holography

2025-12-13 15:49:41
Shuyang Xie, Jie Zhou, Jun Wang, Renjing Xu

Abstract

Computer-generated holography (CGH) presents a transformative solution for near-eye displays in augmented and virtual reality. Recent advances in deep learning have greatly improved CGH in reconstructed quality and computational efficiency. However, deploying neural CGH pipelines directly on compact, eyeglass-style devices is hindered by stringent constraints on computation and energy consumption, while cloud offloading followed by transmission with natural image codecs often distorts phase information and requires high bandwidth to maintain reconstruction quality. Neural compression methods can reduce bandwidth but impose heavy neural decoders at the edge, increasing inference latency and hardware demand. In this work, we introduce JPEG-Inspired Cloud-Edge Holography, an efficient pipeline designed around a learnable transform codec that retains the block-structured and hardware-friendly nature of JPEG. Our system shifts all heavy neural processing to the cloud, while the edge device performs only lightweight decoding without any neural inference. To further improve throughput, we implement custom CUDA kernels for entropy coding on both cloud and edge. This design achieves a peak signal-to-noise ratio of 32.15 dB at $<$ 2 bits per pixel with decode latency as low as 4.2 ms. Both numerical simulations and optical experiments confirm the high reconstruction quality of the holograms. By aligning CGH with a codec that preserves JPEG's structural efficiency while extending it with learnable components, our framework enables low-latency, bandwidth-efficient hologram streaming on resource-constrained wearable devices-using only simple block-based decoding readily supported by modern system-on-chips, without requiring neural decoders or specialized hardware.

Abstract (translated)

计算机生成全息术(CGH)为增强现实和虚拟现实中的眼镜显示提供了一种变革性的解决方案。近年来,深度学习的进步大大提高了CGH的重建质量和计算效率。然而,在紧凑型眼镜式设备上直接部署神经网络CGH管道受到严格的计算和能耗限制的阻碍。而云卸载后使用自然图像编解码器传输则往往会失真相位信息,并且需要高带宽来保持重建质量。虽然神经压缩方法可以降低带宽需求,但会增加边缘端的神经解码负担,导致推理延迟增加及硬件需求提升。 在此项工作中,我们引入了“JPEG启发式的云边全息术”,这是一种围绕可学习变换编解码器设计的有效管道,该编解码器保留了JPEG块结构化和硬件友好型的本质。我们的系统将所有繁重的神经处理移至云端,并且边缘设备仅执行轻量级解码而无需任何神经推理。为了进一步提高吞吐量,我们在云和边缘端实现了定制的CUDA内核进行熵编码。 该设计在小于2比特每像素的情况下达到了32.15分贝的最大信噪比(PSNR),并具备低至4.2毫秒的解码延迟。无论是数值仿真还是光学实验都证实了全息图重建质量之高。通过将CGH与保留JPEG结构性效率并扩展为可学习组件的编解码器相结合,我们的框架能够在资源受限的可穿戴设备上使用仅基于简单块处理并且现代系统级芯片(SoC)已支持的低延迟和带宽高效的全息图流传输而无需神经解码器或专用硬件。

URL

https://arxiv.org/abs/2512.12367

PDF

https://arxiv.org/pdf/2512.12367.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot