Paper Reading AI Learner

MobileHolo: A Lightweight Complex-Valued Deformable CNN for High-Quality Computer-Generated Hologram

2025-06-17 14:02:41
Xie Shuyang, Zhou Jie, Xu Bo, Wang Jun, Xu Renjing

Abstract

Holographic displays have significant potential in virtual reality and augmented reality owing to their ability to provide all the depth cues. Deep learning-based methods play an important role in computer-generated holograms (CGH). During the diffraction process, each pixel exerts an influence on the reconstructed image. However, previous works face challenges in capturing sufficient information to accurately model this process, primarily due to the inadequacy of their effective receptive field (ERF). Here, we designed complex-valued deformable convolution for integration into network, enabling dynamic adjustment of the convolution kernel's shape to increase flexibility of ERF for better feature extraction. This approach allows us to utilize a single model while achieving state-of-the-art performance in both simulated and optical experiment reconstructions, surpassing existing open-source models. Specifically, our method has a peak signal-to-noise ratio that is 2.04 dB, 5.31 dB, and 9.71 dB higher than that of CCNN-CGH, HoloNet, and Holo-encoder, respectively, when the resolution is 1920$\times$1072. The number of parameters of our model is only about one-eighth of that of CCNN-CGH.

Abstract (translated)

全息显示器在虚拟现实和增强现实中具有巨大的潜力,这得益于它们能够提供所有深度线索的能力。基于深度学习的方法在计算机生成的全息图(CGH)中扮演着重要角色。在衍射过程中,每个像素都会对重建图像产生影响。然而,先前的研究面临的一个挑战是难以捕捉足够信息以准确建模这一过程,主要原因是其有效感受野(ERF)的不足。为此,我们设计了一种复值可变形卷积,并将其集成到网络中,允许动态调整卷积核的形状来增加ERF的灵活性,从而更好地提取特征。这种方法使我们在模拟和光学实验重建中能够使用单一模型同时实现最先进的性能,并超越现有的开源模型。 具体而言,当分辨率分别为1920×1072时,我们提出的方法在峰值信噪比(PSNR)方面分别比CCNN-CGH、HoloNet和Holo-encoder高出2.04 dB、5.31 dB 和 9.71 dB。我们的模型参数数量仅是CCNN-CGH的约八分之一。

URL

https://arxiv.org/abs/2506.14542

PDF

https://arxiv.org/pdf/2506.14542.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot