Abstract
Computer-generated holography (CGH) presents a transformative solution for near-eye displays in augmented and virtual reality. Recent advances in deep learning have greatly improved CGH in reconstructed quality and computational efficiency. However, deploying neural CGH pipelines directly on compact, eyeglass-style devices is hindered by stringent constraints on computation and energy consumption, while cloud offloading followed by transmission with natural image codecs often distorts phase information and requires high bandwidth to maintain reconstruction quality. Neural compression methods can reduce bandwidth but impose heavy neural decoders at the edge, increasing inference latency and hardware demand. In this work, we introduce JPEG-Inspired Cloud-Edge Holography, an efficient pipeline designed around a learnable transform codec that retains the block-structured and hardware-friendly nature of JPEG. Our system shifts all heavy neural processing to the cloud, while the edge device performs only lightweight decoding without any neural inference. To further improve throughput, we implement custom CUDA kernels for entropy coding on both cloud and edge. This design achieves a peak signal-to-noise ratio of 32.15 dB at $<$ 2 bits per pixel with decode latency as low as 4.2 ms. Both numerical simulations and optical experiments confirm the high reconstruction quality of the holograms. By aligning CGH with a codec that preserves JPEG's structural efficiency while extending it with learnable components, our framework enables low-latency, bandwidth-efficient hologram streaming on resource-constrained wearable devices-using only simple block-based decoding readily supported by modern system-on-chips, without requiring neural decoders or specialized hardware.
Abstract (translated)
计算机生成全息术(CGH)为增强现实和虚拟现实中的眼镜显示提供了一种变革性的解决方案。近年来,深度学习的进步大大提高了CGH的重建质量和计算效率。然而,在紧凑型眼镜式设备上直接部署神经网络CGH管道受到严格的计算和能耗限制的阻碍。而云卸载后使用自然图像编解码器传输则往往会失真相位信息,并且需要高带宽来保持重建质量。虽然神经压缩方法可以降低带宽需求,但会增加边缘端的神经解码负担,导致推理延迟增加及硬件需求提升。 在此项工作中,我们引入了“JPEG启发式的云边全息术”,这是一种围绕可学习变换编解码器设计的有效管道,该编解码器保留了JPEG块结构化和硬件友好型的本质。我们的系统将所有繁重的神经处理移至云端,并且边缘设备仅执行轻量级解码而无需任何神经推理。为了进一步提高吞吐量,我们在云和边缘端实现了定制的CUDA内核进行熵编码。 该设计在小于2比特每像素的情况下达到了32.15分贝的最大信噪比(PSNR),并具备低至4.2毫秒的解码延迟。无论是数值仿真还是光学实验都证实了全息图重建质量之高。通过将CGH与保留JPEG结构性效率并扩展为可学习组件的编解码器相结合,我们的框架能够在资源受限的可穿戴设备上使用仅基于简单块处理并且现代系统级芯片(SoC)已支持的低延迟和带宽高效的全息图流传输而无需神经解码器或专用硬件。
URL
https://arxiv.org/abs/2512.12367