Paper Reading AI Learner

MOSAIC: Masked Optimisation with Selective Attention for Image Reconstruction

2023-06-01 17:05:02
Pamuditha Somarathne, Tharindu Wickremasinghe, Amashi Niwarthana, A. Thieshanthan, Chamira U.S. Edussooriya, Dushan N. Wadduwage

Abstract

Compressive sensing (CS) reconstructs images from sub-Nyquist measurements by solving a sparsity-regularized inverse problem. Traditional CS solvers use iterative optimizers with hand crafted sparsifiers, while early data-driven methods directly learn an inverse mapping from the low-dimensional measurement space to the original image space. The latter outperforms the former, but is restrictive to a pre-defined measurement domain. More recent, deep unrolling methods combine traditional proximal gradient methods and data-driven approaches to iteratively refine an image approximation. To achieve higher accuracy, it has also been suggested to learn both the sampling matrix, and the choice of measurement vectors adaptively. Contrary to the current trend, in this work we hypothesize that a general inverse mapping from a random set of compressed measurements to the image domain exists for a given measurement basis, and can be learned. Such a model is single-shot, non-restrictive and does not parametrize the sampling process. To this end, we propose MOSAIC, a novel compressive sensing framework to reconstruct images given any random selection of measurements, sampled using a fixed basis. Motivated by the uneven distribution of information across measurements, MOSAIC incorporates an embedding technique to efficiently apply attention mechanisms on an encoded sequence of measurements, while dispensing the need to use unrolled deep networks. A range of experiments validate our proposed architecture as a promising alternative for existing CS reconstruction methods, by achieving the state-of-the-art for metrics of reconstruction accuracy on standard datasets.

Abstract (translated)

压缩感知(CS)通过解决稀疏性限制的逆问题,从低 Nyquist 测量值中恢复图像。传统的 CS 解决方法使用迭代优化工具和手工制作的稀疏化器,而早期的数据驱动方法直接学习从低维度测量空间到原始图像空间的逆映射。前者比后者表现更好,但只适用于预定义测量域。更近期,深度展开方法结合了传统的近邻梯度方法和数据驱动方法,以迭代地 refine 图像近似。为了获得更高的精度,也建议自适应地学习采样矩阵和测量向量的选择。与当前趋势相反,在这个研究中,我们假设从一个随机的压缩测量集合到图像域的通用逆映射存在,并且可以学习。这样的模型是一个一次性的,不限制的,并且不需要对采样过程参数化。为此,我们提出了 MOSAIC,一个新型的 CS 恢复框架,使用一个固定的基采样,通过随机选择测量集合来恢复图像。因为测量信息不均衡分布,MOSAIC 采用了嵌入技术,高效应用注意力机制,在编码序列的测量集合上,而不需要展开深层网络。一系列实验验证我们提出的架构作为现有 CS 恢复方法的有前途的替代方案,通过在标准数据集上实现恢复精度的顶级指标。

URL

https://arxiv.org/abs/2306.00906

PDF

https://arxiv.org/pdf/2306.00906.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot