Paper Reading AI Learner

Dual-Scale Transformer for Large-Scale Single-Pixel Imaging

2024-04-07 15:53:21
Gang Qu, Ping Wang, Xin Yuan

Abstract

Single-pixel imaging (SPI) is a potential computational imaging technique which produces image by solving an illposed reconstruction problem from few measurements captured by a single-pixel detector. Deep learning has achieved impressive success on SPI reconstruction. However, previous poor reconstruction performance and impractical imaging model limit its real-world applications. In this paper, we propose a deep unfolding network with hybrid-attention Transformer on Kronecker SPI model, dubbed HATNet, to improve the imaging quality of real SPI cameras. Specifically, we unfold the computation graph of the iterative shrinkagethresholding algorithm (ISTA) into two alternative modules: efficient tensor gradient descent and hybrid-attention multiscale denoising. By virtue of Kronecker SPI, the gradient descent module can avoid high computational overheads rooted in previous gradient descent modules based on vectorized SPI. The denoising module is an encoder-decoder architecture powered by dual-scale spatial attention for high- and low-frequency aggregation and channel attention for global information recalibration. Moreover, we build a SPI prototype to verify the effectiveness of the proposed method. Extensive experiments on synthetic and real data demonstrate that our method achieves the state-of-the-art performance. The source code and pre-trained models are available at this https URL.

Abstract (translated)

单像素成像(SPI)是一种潜在的计算成像技术,通过从单个像素检测器捕获的几个测量来解决欠拟合重建问题。在SPI重建方面,深度学习取得了令人印象深刻的成功。然而,先前的差重建性能和不可行的成像模型限制了其在现实应用中的实用性。在本文中,我们提出了一个基于Kronecker SPI模型的深度展开网络,称之为HATNet,以提高真实SPI相机的图像质量。具体来说,我们将迭代收缩阈值算法(ISTA)的计算图展开为两个可替代模块:高效的张量梯度下降和混合注意力的多尺度去噪。得益于Kronecker SPI,梯度下降模块可以避免基于之前基于向量化SPI的梯度下降模块的高计算开销。去噪模块是一个基于双尺度空间注意力的编码器-解码器架构,用于高和低频聚合和全局信息重新校正。此外,我们还构建了一个SPI原型,以验证所提出方法的有效性。对合成和真实数据的实验表明,我们的方法实现了最先进的性能。源代码和预训练模型可在此处下载:https://url.cn/spi-prototype

URL

https://arxiv.org/abs/2404.05001

PDF

https://arxiv.org/pdf/2404.05001.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot