Abstract
Holographic displays have significant potential in virtual reality and augmented reality owing to their ability to provide all the depth cues. Deep learning-based methods play an important role in computer-generated holograms (CGH). During the diffraction process, each pixel exerts an influence on the reconstructed image. However, previous works face challenges in capturing sufficient information to accurately model this process, primarily due to the inadequacy of their effective receptive field (ERF). Here, we designed complex-valued deformable convolution for integration into network, enabling dynamic adjustment of the convolution kernel's shape to increase flexibility of ERF for better feature extraction. This approach allows us to utilize a single model while achieving state-of-the-art performance in both simulated and optical experiment reconstructions, surpassing existing open-source models. Specifically, our method has a peak signal-to-noise ratio that is 2.04 dB, 5.31 dB, and 9.71 dB higher than that of CCNN-CGH, HoloNet, and Holo-encoder, respectively, when the resolution is 1920$\times$1072. The number of parameters of our model is only about one-eighth of that of CCNN-CGH.
Abstract (translated)
全息显示器在虚拟现实和增强现实中具有巨大的潜力,这得益于它们能够提供所有深度线索的能力。基于深度学习的方法在计算机生成的全息图(CGH)中扮演着重要角色。在衍射过程中,每个像素都会对重建图像产生影响。然而,先前的研究面临的一个挑战是难以捕捉足够信息以准确建模这一过程,主要原因是其有效感受野(ERF)的不足。为此,我们设计了一种复值可变形卷积,并将其集成到网络中,允许动态调整卷积核的形状来增加ERF的灵活性,从而更好地提取特征。这种方法使我们在模拟和光学实验重建中能够使用单一模型同时实现最先进的性能,并超越现有的开源模型。 具体而言,当分辨率分别为1920×1072时,我们提出的方法在峰值信噪比(PSNR)方面分别比CCNN-CGH、HoloNet和Holo-encoder高出2.04 dB、5.31 dB 和 9.71 dB。我们的模型参数数量仅是CCNN-CGH的约八分之一。
URL
https://arxiv.org/abs/2506.14542