Hyperspectral Imaging (HSI) is used in a wide range of applications such as remote sensing, yet the transmission of the HS images by communication data links becomes challenging due to the large number of spectral bands that the HS images contain together with the limited data bandwidth available in real applications. Compressive Sensing reduces the images by randomly subsampling the spectral bands of each spatial pixel and then it performs the image reconstruction of all the bands using recovery algorithms which impose sparsity in a certain transform domain. Since the image pixels are not strictly sparse, this work studies a data sparsification pre-processing stage prior to compression to ensure the sparsity of the pixels. The sparsified images are compressed $2.5\times$ and then recovered using the Generalized Orthogonal Matching Pursuit algorithm (gOMP) characterized by high accuracy, low computational requirements and fast convergence. The experiments are performed in five conventional hyperspectral images where the effect of different sparsification levels in the quality of the uncompressed as well as the recovered images is studied. It is concluded that the gOMP algorithm reconstructs the hyperspectral images with higher accuracy as well as faster convergence when the pixels are highly sparsified and hence at the expense of reducing the quality of the recovered images with respect to the original images.
超分辨率成像(HSI)在遥感和许多应用领域中都有广泛应用,然而,通过通信数据链传输HS图像变得具有挑战性,因为HS图像包含大量共同光谱带,同时现实应用中可用的数据带宽有限。压缩感知通过随机采样每个空间像素的频率带,然后使用恢复算法对所有频率带进行图像重建,这些算法在某个变换域中引入了稀疏性。由于图像像素不是严格稀疏的,因此本文研究了在压缩之前进行数据稀疏化预处理阶段,以确保像素的稀疏性。稀疏化的图像被压缩2.5倍,然后使用高精度、低计算要求和快速收敛的广义欧几里得匹配 pursuit(gOMP)算法进行恢复。实验在五种常规超分辨率图像上进行,研究了不同稀疏度级别对压缩和恢复图像质量的影响。得出的结论是,当像素高度稀疏时,gOMP算法在压缩和恢复超分辨率图像时具有更高的准确性和更快的收敛速度,但会降低恢复图像与原始图像之间的质量。
https://arxiv.org/abs/2401.14786
Hyperspectral Imaging comprises excessive data consequently leading to significant challenges for data processing, storage and transmission. Compressive Sensing has been used in the field of Hyperspectral Imaging as a technique to compress the large amount of data. This work addresses the recovery of hyperspectral images 2.5x compressed. A comparative study in terms of the accuracy and the performance of the convex FISTA/ADMM in addition to the greedy gOMP/BIHT/CoSaMP recovery algorithms is presented. The results indicate that the algorithms recover successfully the compressed data, yet the gOMP algorithm achieves superior accuracy and faster recovery in comparison to the other algorithms at the expense of high dependence on unknown sparsity level of the data to recover.
超分辨率成像包括 excessive data,因此对数据处理、存储和传输带来显著挑战。在超分辨率成像领域,压缩感知是一种用于压缩大量数据的压缩技术。本文研究了在超分辨率成像中恢复压缩后超分辨率图像2.5倍压缩的情况。在超分辨率图像恢复方面,对凸FISTA/ADMM的准确性和性能进行了比较研究,还包括贪婪梯度OMP/BIHT/CoSaMP恢复算法。结果显示,这些算法成功地恢复压缩后的数据,然而,与其它算法相比,gOMP算法在准确性和恢复速度方面具有优越性,但数据的不确定稀疏水平对算法恢复效果的影响较大。
https://arxiv.org/abs/2401.14762
Video Captioning (VC) is a challenging multi-modal task since it requires describing the scene in language by understanding various and complex videos. For machines, the traditional VC follows the "imaging-compression-decoding-and-then-captioning" pipeline, where compression is pivot for storage and transmission. However, in such a pipeline, some potential shortcomings are inevitable, i.e., information redundancy resulting in low efficiency and information loss during the sampling process for captioning. To address these problems, in this paper, we propose a novel VC pipeline to generate captions directly from the compressed measurement, which can be captured by a snapshot compressive sensing camera and we dub our model SnapCap. To be more specific, benefiting from the signal simulation, we have access to obtain abundant measurement-video-annotation data pairs for our model. Besides, to better extract language-related visual representations from the compressed measurement, we propose to distill the knowledge from videos via a pre-trained CLIP with plentiful language-vision associations to guide the learning of our SnapCap. To demonstrate the effectiveness of SnapCap, we conduct experiments on two widely-used VC datasets. Both the qualitative and quantitative results verify the superiority of our pipeline over conventional VC pipelines. In particular, compared to the "caption-after-reconstruction" methods, our SnapCap can run at least 3$\times$ faster, and achieve better caption results.
视频字幕(VC)是一个具有挑战性的多模态任务,因为需要通过理解各种复杂视频来用语言描述场景。对于机器来说,传统的VC沿着“图像压缩-解码-然后编码”的流程进行,压缩是存储和传输的关键。然而,在这样一个流程中,一些潜在的缺陷是无法避免的,即压缩过程中信息冗余导致效率低和信息丢失。为解决这些问题,本文提出了一种新颖的VC管道,可以直接从压缩测量中生成字幕,可以被快照压缩感知相机捕获,我们称之为SnapCap。 具体来说,利用信号仿真,我们获得了为我们的模型提供丰富测量视频注释数据对的能力。此外,为了更好地从压缩测量中提取与语言相关的视觉表示,我们通过预训练的CLIP(带有丰富语言视觉关联)来蒸馏知识,以指导我们的SnapCap的学习。为了证明SnapCap的有效性,我们在两个广泛使用的VC数据集上进行了实验。两个质量和数量结果证实了我们的管道优越于传统VC管道。特别是,与“在重构后进行编码”的方法相比,我们的SnapCap至少可以快3倍,并实现更好的字幕效果。
https://arxiv.org/abs/2401.04903
Compressive sensing (CS) is a technique that enables the recovery of sparse signals using fewer measurements than traditional sampling methods. To address the computational challenges of CS reconstruction, our objective is to develop an interpretable and concise neural network model for reconstructing natural images using CS. We achieve this by mapping one step of the iterative shrinkage thresholding algorithm (ISTA) to a deep network block, representing one iteration of ISTA. To enhance learning ability and incorporate structural diversity, we integrate aggregated residual transformations (ResNeXt) and squeeze-and-excitation (SE) mechanisms into the ISTA block. This block serves as a deep equilibrium layer, connected to a semi-tensor product network (STP-Net) for convenient sampling and providing an initial reconstruction. The resulting model, called MsDC-DEQ-Net, exhibits competitive performance compared to state-of-the-art network-based methods. It significantly reduces storage requirements compared to deep unrolling methods, using only one iteration block instead of multiple iterations. Unlike deep unrolling models, MsDC-DEQ-Net can be iteratively used, gradually improving reconstruction accuracy while considering computation trade-offs. Additionally, the model benefits from multi-scale dilated convolutions, further enhancing performance.
压缩感知(CS)是一种使用比传统采样方法更少的测量的技术,以恢复稀疏信号。为了应对CS重建的计算挑战,我们的目标是开发一个可解释且紧凑的神经网络模型,用于使用CS对自然图像进行重建。我们通过将迭代收缩阈值算法(ISTA)的迭代一步映射到深度网络块来实现这一目标,表示ISTA的迭代。为了提高学习能力和包含结构多样性,我们将聚合残差变换(ResNeXt)和压缩和激发(SE)机制集成到ISTA模块中。这个模块充当一个深度平衡层,连接到半张量产品网络(STP-Net),用于方便的采样和提供初始重建。所得到的模型,称为MsDC-DEQ-Net,与基于网络的先进方法相比具有竞争性能。它显著减少了存储需求,只需使用一个迭代块而不是多个迭代块。与深展开模型不同,MsDC-DEQ-Net可以迭代使用,在考虑计算开销的同时,逐渐提高重建准确性。此外,模型还利用多尺度扩散卷积,进一步提高性能。
https://arxiv.org/abs/2401.02884
Incorporating prior information into inverse problems, e.g. via maximum-a-posteriori estimation, is an important technique for facilitating robust inverse problem solutions. In this paper, we devise two novel approaches for linear inverse problems that permit problem-specific statistical prior selections within the compound Gaussian (CG) class of distributions. The CG class subsumes many commonly used priors in signal and image reconstruction methods including those of sparsity-based approaches. The first method developed is an iterative algorithm, called generalized compound Gaussian least squares (G-CG-LS), that minimizes a regularized least squares objective function where the regularization enforces a CG prior. G-CG-LS is then unrolled, or unfolded, to furnish our second method, which is a novel deep regularized (DR) neural network, called DR-CG-Net, that learns the prior information. A detailed computational theory on convergence properties of G-CG-LS and thorough numerical experiments for DR-CG-Net are provided. Due to the comprehensive nature of the CG prior, these experiments show that our unrolled DR-CG-Net outperforms competitive prior art methods in tomographic imaging and compressive sensing, especially in challenging low-training scenarios.
将先验信息融入反问题中,例如通过最大后验估计,是促进稳健反问题解的重要技术。在本文中,我们提出了两种新的线性反问题方法,允许在复合高斯(CG)分布中进行问题特定的统计先验选择。CG类包括许多在信号和图像重建方法中常用的先验,包括基于稀疏度的方法。我们开发的第一种方法是一个迭代算法,称为一般化复合高斯最小二乘(G-CG-LS)方法,它最小化一个正则化最小二乘目标函数,其中正则化强制执行CG先验。G-CG-LS然后展开或展开,以提供我们的第二种方法,即名为DR-CG-Net的新型深度正则化(DR)神经网络,它学习先验信息。关于G-CG-LS的收敛性质的详细计算理论和DR-CG-Net的深入数值实验都在本文中提供了。由于CG先验的全局性,这些实验表明,我们的未展开DR-CG-Net在断层成像和压缩感知领域优于竞争先驱技术,尤其是在具有挑战性的低训练场景中。
https://arxiv.org/abs/2311.17248
Underwater Sound Speed Profile (SSP) distribution has great influence on the propagation mode of acoustic signal, thus the fast and accurate estimation of SSP is of great importance in building underwater observation systems. The state-of-the-art SSP inversion methods include frameworks of matched field processing (MFP), compressive sensing (CS), and feedforeward neural networks (FNN), among which the FNN shows better real-time performance while maintain the same level of accuracy. However, the training of FNN needs quite a lot historical SSP samples, which is diffcult to be satisfied in many ocean areas. This situation is called few-shot learning. To tackle this issue, we propose a multi-task learning (MTL) model with partial parameter sharing among different traning tasks. By MTL, common features could be extracted, thus accelerating the learning process on given tasks, and reducing the demand for reference samples, so as to enhance the generalization ability in few-shot learning. To verify the feasibility and effectiveness of MTL, a deep-ocean experiment was held in April 2023 at the South China Sea. Results shows that MTL outperforms the state-of-the-art methods in terms of accuracy for SSP inversion, while inherits the real-time advantage of FNN during the inversion stage.
水下声速剖面(SSP)分布对声波传播模式有很大的影响,因此快速和准确地估计SSP对构建水下观测系统非常重要。最先进的SSP反演方法包括匹配场处理(MFP)框架、压缩感知(CS)和前馈神经网络(FNN)等。其中,FNN在保持相同准确性的同时具有更好的实时性能。然而,为了训练FNN,需要相当多的历史SSP样本,这在许多海洋区域中是难以满足的。这种情况称为欠样本学习。为了解决这个问题,我们提出了一个多任务学习(MTL)模型,其中不同训练任务之间共享部分参数。通过MTL,可以提取共同特征,从而加速在给定任务上的学习过程,并减少对参考样本的需求,从而增强在欠样本学习中的泛化能力。为了验证MTL的可行性和有效性,2023年4月在南海进行了一个深海实验。结果表明,MTL在SSP反演方面的准确性超过了最先进的方法,而在反演阶段,FNN具有实时优势。
https://arxiv.org/abs/2310.11708
The sparse modeling is an evident manifestation capturing the parsimony principle just described, and sparse models are widespread in statistics, physics, information sciences, neuroscience, computational mathematics, and so on. In statistics the many applications of sparse modeling span regression, classification tasks, graphical model selection, sparse M-estimators and sparse dimensionality reduction. It is also particularly effective in many statistical and machine learning areas where the primary goal is to discover predictive patterns from data which would enhance our understanding and control of underlying physical, biological, and other natural processes, beyond just building accurate outcome black-box predictors. Common examples include selecting biomarkers in biological procedures, finding relevant brain activity locations which are predictive about brain states and processes based on fMRI data, and identifying network bottlenecks best explaining end-to-end performance. Moreover, the research and applications of efficient recovery of high-dimensional sparse signals from a relatively small number of observations, which is the main focus of compressed sensing or compressive sensing, have rapidly grown and became an extremely intense area of study beyond classical signal processing. Likewise interestingly, sparse modeling is directly related to various artificial vision tasks, such as image denoising, segmentation, restoration and superresolution, object or face detection and recognition in visual scenes, and action recognition. In this manuscript, we provide a brief introduction of the basic theory underlying sparse representation and compressive sensing, and then discuss some methods for recovering sparse solutions to optimization problems in effective way, together with some applications of sparse recovery in a machine learning problem known as sparse dictionary learning.
稀疏建模是一种明显的表现,捕捉到我刚才描述的简洁性原则,稀疏模型在统计、物理、信息科学、神经科学、计算数学等领域广泛应用。在统计中,稀疏建模的许多应用涵盖了回归、分类任务、图形模型选择、稀疏高斯估计和稀疏维度减少。它还在许多统计和机器学习领域中特别有效,其主要目标是从数据中发现预测模式,这将增强我们对基础物理、生物和自然过程的理解和控制,超越了仅仅建立准确的黑盒预测器。常见的例子包括在生物学过程中选择生物标记物、基于FMRI数据的 Brain 活动位置找到相关的脑活动区域、并确定网络瓶颈的最佳解释,最有效地解释整体性能。此外,研究和应用从相对少量的观察数据中高效恢复高维稀疏信号的研究和应用,这是压缩感知或压缩感知的主要关注点,已经迅速增长并成为 classical 信号处理之外极为强烈的研究领域。类似地,稀疏建模直接与各种人工视觉任务相关,例如图像去噪、分割、恢复和超分辨率、视觉场景中的物体或面部检测和识别,以及动作识别。在本文中,我们简要介绍了稀疏表示和压缩感知的基础理论,然后讨论了如何有效地恢复优化问题的稀疏解决方案,以及在稀疏字典学习机器学习问题中的稀疏恢复应用。
https://arxiv.org/abs/2308.13960
We present a novel approach to implement compressive sensing in laser scanning microscopes (LSM), specifically in image scanning microscopy (ISM), using a single-photon avalanche diode (SPAD) array detector. Our method addresses two significant limitations in applying compressive sensing to LSM: the time to compute the sampling matrix and the quality of reconstructed images. We employ a fixed sampling strategy, skipping alternate rows and columns during data acquisition, which reduces the number of points scanned by a factor of four and eliminates the need to compute different sampling matrices. By exploiting the parallel images generated by the SPAD array, we improve the quality of the reconstructed compressive-ISM images compared to standard compressive confocal LSM images. Our results demonstrate the effectiveness of our approach in producing higher-quality images with reduced data acquisition time and potential benefits in reducing photobleaching.
我们提出了一种新颖的 approach 来实现激光扫描显微镜(LSM)中 compressive sensing 的实现,特别是图像扫描显微镜(ISM)中。我们使用了单光子并发射二极管(SPAD)阵列探测器来实现这一目标。我们的方法解决了在 LSM 中应用 compressive sensing 的两个重要限制:计算采样矩阵所需的时间以及重建图像的质量。我们采用了固定的采样策略,在数据收集过程中跳过交替的行和列,这减少了扫描点的个数,实现了 four 倍的减少,并取代了计算不同采样矩阵的需求。通过利用 SPAD 数组产生的并行图像,我们提高了重建的 compressive-ISM 图像的质量,与标准 compressive confocal LSM 图像相比。我们的结果表明,我们的方法可以在减少数据收集时间的同时提高图像质量,并可能有助于减少 photobleaching 的效果。
https://arxiv.org/abs/2307.09841
Video Compressed Sensing (VCS) aims to reconstruct multiple frames from one single captured measurement, thus achieving high-speed scene recording with a low-frame-rate sensor. Although there have been impressive advances in VCS recently, those state-of-the-art (SOTA) methods also significantly increase model complexity and suffer from poor generality and robustness, which means that those networks need to be retrained to accommodate the new system. Such limitations hinder the real-time imaging and practical deployment of models. In this work, we propose a Sampling-Priors-Augmented Deep Unfolding Network (SPA-DUN) for efficient and robust VCS reconstruction. Under the optimization-inspired deep unfolding framework, a lightweight and efficient U-net is exploited to downsize the model while improving overall performance. Moreover, the prior knowledge from the sampling model is utilized to dynamically modulate the network features to enable single SPA-DUN to handle arbitrary sampling settings, augmenting interpretability and generality. Extensive experiments on both simulation and real datasets demonstrate that SPA-DUN is not only applicable for various sampling settings with one single model but also achieves SOTA performance with incredible efficiency.
视频压缩感知(VCS)的目标是从单个捕获测量中重构多个帧,从而实现低帧率传感器的高速度场景录制。尽管最近在VCS方面取得了令人印象深刻的进展,但这些先进的方法也显著增加了模型的复杂性并出现了 poor generality和Robustness 的问题,这意味着这些网络需要适应新的系统并进行训练。这些限制妨碍了实时成像和模型的实际部署。在本文中,我们提出了一种采样先验增强深度展开网络(SPA-DUN)来高效和稳健地 VCS 重构。在基于优化的深度展开框架下,利用轻量级且高效的 U-net 减小模型大小并提高整体性能。此外,从采样模型的先验知识动态地调节网络特征,使单个 SPA-DUN 能够处理任意采样设置,增加可解释性和一般性。在模拟和真实数据集上的广泛实验表明,SPA-DUN不仅可以适用于单个模型的各种采样设置,而且具有惊人的效率和 SOTA 性能。
https://arxiv.org/abs/2307.07291
In this work, we propose a novel approach called Operational Support Estimator Networks (OSENs) for the support estimation task. Support Estimation (SE) is defined as finding the locations of non-zero elements in a sparse signal. By its very nature, the mapping between the measurement and sparse signal is a non-linear operation. Traditional support estimators rely on computationally expensive iterative signal recovery techniques to achieve such non-linearity. Contrary to the convolution layers, the proposed OSEN approach consists of operational layers that can learn such complex non-linearities without the need for deep networks. In this way, the performance of the non-iterative support estimation is greatly improved. Moreover, the operational layers comprise so-called generative \textit{super neurons} with non-local kernels. The kernel location for each neuron/feature map is optimized jointly for the SE task during the training. We evaluate the OSENs in three different applications: i. support estimation from Compressive Sensing (CS) measurements, ii. representation-based classification, and iii. learning-aided CS reconstruction where the output of OSENs is used as prior knowledge to the CS algorithm for an enhanced reconstruction. Experimental results show that the proposed approach achieves computational efficiency and outperforms competing methods, especially at low measurement rates by a significant margin. The software implementation is publicly shared at this https URL.
在本作品中,我们提出了一种名为操作支持估计器网络(osen)的新方法,用于支持估计任务。支持估计(SE)的定义是找到稀疏信号中的非零元素的位置。从特性的角度来看,测量和稀疏信号之间的映射是一种非线性操作。传统的支持估计方法依赖于计算代价高昂的迭代信号恢复技术来实现这种非线性。与卷积层相反,我们提出的osen方法包括操作层,这些操作层不需要深度网络就可以学习这些复杂的非线性特性。通过这种方式,非迭代的支持估计性能得到了极大的改善。此外,操作层由所谓的生成神经元(super neurons)组成的,这些神经元具有非局部Kernel。每个神经元/特征映射的Kernel位置在训练期间 jointly 用于支持估计任务。我们在不同的应用程序中评估了osen:i.从压缩感知测量(CS)测量中进行支持估计,ii.基于表示的分类,以及iii.学习辅助的CS重构,其中osen的输出被用作CS算法的前置知识,以增强重构。实验结果表明,我们提出的方法实现了计算效率,并比竞争方法更有效,特别是在低测量率方面具有显著优势。软件实现在此httpsURL上公开分享。
https://arxiv.org/abs/2307.06065
Deep unfolding network (DUN) that unfolds the optimization algorithm into a deep neural network has achieved great success in compressive sensing (CS) due to its good interpretability and high performance. Each stage in DUN corresponds to one iteration in optimization. At the test time, all the sampling images generally need to be processed by all stages, which comes at a price of computation burden and is also unnecessary for the images whose contents are easier to restore. In this paper, we focus on CS reconstruction and propose a novel Dynamic Path-Controllable Deep Unfolding Network (DPC-DUN). DPC-DUN with our designed path-controllable selector can dynamically select a rapid and appropriate route for each image and is slimmable by regulating different performance-complexity tradeoffs. Extensive experiments show that our DPC-DUN is highly flexible and can provide excellent performance and dynamic adjustment to get a suitable tradeoff, thus addressing the main requirements to become appealing in practice. Codes are available at this https URL.
展开优化算法并将其展开成深度神经网络的 Deep Un unfold Network (DUN) 在压缩感知(CS)方面取得了巨大的成功,因为其良好的解释性和高性能。DUN 的每一阶段对应着优化中的迭代。在测试时,通常需要对所有采样图像进行处理,这需要计算负担,而对于更容易恢复其内容的图像则没有必要。在本文中,我们专注于 CS 重建并提出了一种新型的动态路径控制 Deep Un unfold Network (DPC-DUN)。DPC-DUN 使用我们设计的可控制路径选择器可以动态地选择每个图像的迅速且适当的路径,并通过调节不同的性能-复杂性权衡来减小规模。广泛的实验表明,我们的 DPC-DUN 非常灵活,可以提供出色的性能和动态调整,以获得适当的权衡,从而满足了在实践中变得吸引人的主要要求。代码可在 this https URL 上获取。
https://arxiv.org/abs/2306.16060
Compressive sensing (CS) reconstructs images from sub-Nyquist measurements by solving a sparsity-regularized inverse problem. Traditional CS solvers use iterative optimizers with hand crafted sparsifiers, while early data-driven methods directly learn an inverse mapping from the low-dimensional measurement space to the original image space. The latter outperforms the former, but is restrictive to a pre-defined measurement domain. More recent, deep unrolling methods combine traditional proximal gradient methods and data-driven approaches to iteratively refine an image approximation. To achieve higher accuracy, it has also been suggested to learn both the sampling matrix, and the choice of measurement vectors adaptively. Contrary to the current trend, in this work we hypothesize that a general inverse mapping from a random set of compressed measurements to the image domain exists for a given measurement basis, and can be learned. Such a model is single-shot, non-restrictive and does not parametrize the sampling process. To this end, we propose MOSAIC, a novel compressive sensing framework to reconstruct images given any random selection of measurements, sampled using a fixed basis. Motivated by the uneven distribution of information across measurements, MOSAIC incorporates an embedding technique to efficiently apply attention mechanisms on an encoded sequence of measurements, while dispensing the need to use unrolled deep networks. A range of experiments validate our proposed architecture as a promising alternative for existing CS reconstruction methods, by achieving the state-of-the-art for metrics of reconstruction accuracy on standard datasets.
压缩感知(CS)通过解决稀疏性限制的逆问题,从低 Nyquist 测量值中恢复图像。传统的 CS 解决方法使用迭代优化工具和手工制作的稀疏化器,而早期的数据驱动方法直接学习从低维度测量空间到原始图像空间的逆映射。前者比后者表现更好,但只适用于预定义测量域。更近期,深度展开方法结合了传统的近邻梯度方法和数据驱动方法,以迭代地 refine 图像近似。为了获得更高的精度,也建议自适应地学习采样矩阵和测量向量的选择。与当前趋势相反,在这个研究中,我们假设从一个随机的压缩测量集合到图像域的通用逆映射存在,并且可以学习。这样的模型是一个一次性的,不限制的,并且不需要对采样过程参数化。为此,我们提出了 MOSAIC,一个新型的 CS 恢复框架,使用一个固定的基采样,通过随机选择测量集合来恢复图像。因为测量信息不均衡分布,MOSAIC 采用了嵌入技术,高效应用注意力机制,在编码序列的测量集合上,而不需要展开深层网络。一系列实验验证我们提出的架构作为现有 CS 恢复方法的有前途的替代方案,通过在标准数据集上实现恢复精度的顶级指标。
https://arxiv.org/abs/2306.00906
For solving linear inverse problems, particularly of the type that appear in tomographic imaging and compressive sensing, this paper develops two new approaches. The first approach is an iterative algorithm that minimizers a regularized least squares objective function where the regularization is based on a compound Gaussian prior distribution. The Compound Gaussian prior subsumes many of the commonly used priors in image reconstruction, including those of sparsity-based approaches. The developed iterative algorithm gives rise to the paper's second new approach, which is a deep neural network that corresponds to an "unrolling" or "unfolding" of the iterative algorithm. Unrolled deep neural networks have interpretable layers and outperform standard deep learning methods. This paper includes a detailed computational theory that provides insight into the construction and performance of both algorithms. The conclusion is that both algorithms outperform other state-of-the-art approaches to tomographic image formation and compressive sensing, especially in the difficult regime of low training.
为了解决线性逆问题,特别是出现在磁共振成像和压缩感知中的问题,本文开发了两种新的算法。第一种方法是迭代算法,其最小化的目标是 regularized 最小二乘法 objective function,其中Regularization是基于组合高斯先验分布的。组合高斯先验分布将许多在图像重建中常用的先验包括在内,包括基于密度的先验。开发迭代算法导致本文提出的第二种新算法,这是一种深度神经网络,与迭代算法的“展开”或“展开”对应。展开的深度神经网络具有可解释的层,并比标准深度学习方法表现更好。本文包括详细的计算理论,提供了对两个算法构造和性能的理解。结论是,两个算法在磁共振成像和压缩感知中的表现优于其他先进的方法,特别是在低训练状态下。
https://arxiv.org/abs/2305.11120
The use of deep unfolding networks in compressive sensing (CS) has seen wide success as they provide both simplicity and interpretability. However, since most deep unfolding networks are iterative, this incurs significant redundancies in the network. In this work, we propose a novel recursion-based framework to enhance the efficiency of deep unfolding models. First, recursions are used to effectively eliminate the redundancies in deep unfolding networks. Secondly, we randomize the number of recursions during training to decrease the overall training time. Finally, to effectively utilize the power of recursions, we introduce a learnable unit to modulate the features of the model based on both the total number of iterations and the current iteration index. To evaluate the proposed framework, we apply it to both ISTA-Net+ and COAST. Extensive testing shows that our proposed framework allows the network to cut down as much as 75% of its learnable parameters while mostly maintaining its performance, and at the same time, it cuts around 21% and 42% from the training time for ISTA-Net+ and COAST respectively. Moreover, when presented with a limited training dataset, the recursive models match or even outperform their respective non-recursive baseline. Codes and pretrained models are available at this https URL .
深度展开网络在压缩感知(CS)中广泛应用,因为它们既简单又易于解释。然而,由于大多数深度展开网络是迭代的,因此在网络中造成了巨大的冗余。在本研究中,我们提出了一种基于递归的新框架,以增强深度展开模型的效率。我们首先利用递归有效地消除深度展开网络中的冗余。其次,我们在训练期间随机化递归次数,以减少整个训练时间。最后,为了有效地利用递归的力量,我们引入一个可学习单元,根据迭代次数的总数量和当前迭代指数来调节模型的特征。为了评估所提出的框架,我们将其应用于ISTA-Net+和COAST。广泛的测试表明,我们提出的框架可以使网络最大限度地减少其可学习参数中的冗余,而大部分时间仍然保持其表现,同时,它分别减少了ISTA-Net+和COAST的训练时间中的约21%和42%。此外,当面对有限的训练数据时,递归模型几乎与或甚至超越了其非递归基线。代码和预训练模型可在本网站上https://url.com获得。
https://arxiv.org/abs/2305.05505
Deep learning has been applied to compressive sensing (CS) of images successfully in recent years. However, existing network-based methods are often trained as the black box, in which the lack of prior knowledge is often the bottleneck for further performance improvement. To overcome this drawback, this paper proposes a novel CS method using non-local prior which combines the interpretability of the traditional optimization methods with the speed of network-based methods, called NL-CS Net. We unroll each phase from iteration of the augmented Lagrangian method solving non-local and sparse regularized optimization problem by a network. NL-CS Net is composed of the up-sampling module and the recovery module. In the up-sampling module, we use learnable up-sampling matrix instead of a predefined one. In the recovery module, patch-wise non-local network is employed to capture long-range feature correspondences. Important parameters involved (e.g. sampling matrix, nonlinear transforms, shrinkage thresholds, step size, $etc.$) are learned end-to-end, rather than hand-crafted. Furthermore, to facilitate practical implementation, orthogonal and binary constraints on the sampling matrix are simultaneously adopted. Extensive experiments on natural images and magnetic resonance imaging (MRI) demonstrate that the proposed method outperforms the state-of-the-art methods while maintaining great interpretability and speed.
深度学习近年来成功地应用于图像压缩感知(CS)中。然而,现有的网络方法往往被训练为黑盒,缺乏先前知识往往成为进一步性能改进的瓶颈。为了克服这一缺点,本文提出了一种使用非局部先前的新型CS方法,该方法将传统的优化方法的解释性与网络方法的速度相结合,称为NL-CSNet。我们从扩展拉格朗日方法的迭代中展开每个阶段,以解决非局部和稀疏 Regularized 优化问题,通过网络实现。NL-CSNet由采样模块和恢复模块组成。在采样模块中,我们使用可学习采样矩阵而不是预先定义的矩阵。在恢复模块中,采用点 wise的非局部网络来捕捉长距离特征对应关系。参与重要参数(例如采样矩阵、非线性变换、收缩阈值、步长大小等)的学习是端到端学习的,而不是手工构建的。此外,为了促进实际实现,同时采用Orthogonal 和二进制采样矩阵约束。对自然图像和磁共振成像(MRI)进行了广泛的实验,证明了该方法在保持极大的解释性和速度优势的同时,击败了最先进的方法。
https://arxiv.org/abs/2305.03899
By integrating certain optimization solvers with deep neural networks, deep unfolding network (DUN) with good interpretability and high performance has attracted growing attention in compressive sensing (CS). However, existing DUNs often improve the visual quality at the price of a large number of parameters and have the problem of feature information loss during iteration. In this paper, we propose an Optimization-inspired Cross-attention Transformer (OCT) module as an iterative process, leading to a lightweight OCT-based Unfolding Framework (OCTUF) for image CS. Specifically, we design a novel Dual Cross Attention (Dual-CA) sub-module, which consists of an Inertia-Supplied Cross Attention (ISCA) block and a Projection-Guided Cross Attention (PGCA) block. ISCA block introduces multi-channel inertia forces and increases the memory effect by a cross attention mechanism between adjacent iterations. And, PGCA block achieves an enhanced information interaction, which introduces the inertia force into the gradient descent step through a cross attention block. Extensive CS experiments manifest that our OCTUF achieves superior performance compared to state-of-the-art methods while training lower complexity. Codes are available at this https URL.
通过将某些优化求解器和深度神经网络集成起来,具有良好解释性和高性能的深度展开网络(DUN)在压缩感知(CS)中越来越受到关注。然而,现有的DUN往往通过大量参数来提高视觉质量,并且在迭代过程中会出现特征信息丢失的问题。在本文中,我们提出一种基于优化的交叉注意力Transformer(OCT)模块作为迭代过程,从而生成轻量级的基于OCT的图像展开框架(OCTUF)。具体来说,我们设计了一个独特的双重交叉注意力(Dual-CA)子模块,其中包含一个惯性提供交叉注意力(ISCA)块和一个投影引导交叉注意力(PGCA)块。ISCA块引入了多通道惯性力,并通过相邻迭代中的交叉注意力机制增加记忆效应。PGCA块实现了增强的信息交互,通过交叉注意力块将惯性力引入梯度下降步骤。广泛的CS实验表明,我们的OCTUF在训练复杂性较低的情况下比现有方法表现出更好的性能。代码可在该httpsURL上获取。
https://arxiv.org/abs/2304.13986
Deep network-based image and video Compressive Sensing(CS) has attracted increasing attentions in recent years. However, in the existing deep network-based CS methods, a simple stacked convolutional network is usually adopted, which not only weakens the perception of rich contextual prior knowledge, but also limits the exploration of the correlations between temporal video frames. In this paper, we propose a novel Hierarchical InTeractive Video CS Reconstruction Network(HIT-VCSNet), which can cooperatively exploit the deep priors in both spatial and temporal domains to improve the reconstruction quality. Specifically, in the spatial domain, a novel hierarchical structure is designed, which can hierarchically extract deep features from keyframes and non-keyframes. In the temporal domain, a novel hierarchical interaction mechanism is proposed, which can cooperatively learn the correlations among different frames in the multiscale space. Extensive experiments manifest that the proposed HIT-VCSNet outperforms the existing state-of-the-art video and image CS methods in a large margin.
深度学习图像和视频压缩感知(CS)近年来日益受到关注。然而,在现有的深度学习CS方法中,通常采用简单的叠加卷积神经网络,这不仅削弱了丰富的上下文先验知识的感受度,而且也限制了对时间帧之间相关性的探索。在本文中,我们提出了一种新的Hierarchical InTeractive Video CS Reconstruction Network(HIT-VCSNet),它可以合作利用空间和时间域中的深层先验知识,以提高重建质量。具体来说,在空间域中,我们设计了一种 novel Hierarchical 结构,可以从关键帧和非关键帧中Hierarchically 提取深层特征。在时间域中,我们提出了一种 novel Hierarchical 交互机制,可以在多尺度空间中合作学习不同帧之间的相关性。广泛的实验表明,提出的HIT-VCSNet在显著优于现有的先进的视频和图像CS方法。
https://arxiv.org/abs/2304.07473
Model-based deep learning methods that combine imaging physics with learned regularization priors have been emerging as powerful tools for parallel MRI acceleration. The main focus of this paper is to determine the utility of the monotone operator learning (MOL) framework in the parallel MRI setting. The MOL algorithm alternates between a gradient descent step using a monotone convolutional neural network (CNN) and a conjugate gradient algorithm to encourage data consistency. The benefits of this approach include similar guarantees as compressive sensing algorithms including uniqueness, convergence, and stability, while being significantly more memory efficient than unrolled methods. We validate the proposed scheme by comparing it with different unrolled algorithms in the context of accelerated parallel MRI for static and dynamic settings.
基于模型的深度学习方法,将图像物理学与学习后的正则化预处理相结合,正在成为并行MRI加速的强大工具。本文的主要焦点是确定Monotone Operator Learning (MOL)框架在并行MRI设置中的实用性。MOL算法在采用Monotone卷积神经网络(CNN)和Conjugate Gradient算法的梯度下降步骤之间交替进行,以鼓励数据一致性。这种方法的好处包括与压缩感知算法类似的保证,包括独特性、收敛和稳定性,而比展开方法更有效地利用了内存。我们验证了所提出的 scheme 的方法,并将其与不同的展开算法在加速静态和动态MRI设置中的并行加速比较。
https://arxiv.org/abs/2304.01351
Deep networks can be trained to map images into a low-dimensional latent space. In many cases, different images in a collection are articulated versions of one another; for example, same object with different lighting, background, or pose. Furthermore, in many cases, parts of images can be corrupted by noise or missing entries. In this paper, our goal is to recover images without access to the ground-truth (clean) images using the articulations as structural prior of the data. Such recovery problems fall under the domain of compressive sensing. We propose to learn autoencoder with tensor ring factorization on the the embedding space to impose structural constraints on the data. In particular, we use a tensor ring structure in the bottleneck layer of the autoencoder that utilizes the soft labels of the structured dataset. We empirically demonstrate the effectiveness of the proposed approach for inpainting and denoising applications. The resulting method achieves better reconstruction quality compared to other generative prior-based self-supervised recovery approaches for compressive sensing.
深度网络可以被训练将图像映射到低维度的潜在空间。在许多情况下,一组图像是相互联系的版本;例如,相同的物体以不同的照明、背景或姿态呈现。此外,在许多情况下,图像的部分可以受到噪声或缺失值的影响。在本文中,我们的的目标是使用连接作为数据的结构先验来恢复图像,而连接则作为数据的结构约束。这种恢复问题属于压缩感知技术的范围。我们提议学习使用 Tensor环乘法将嵌入空间中的 Tensor环表示作为编码器的结构先验,并施加数据的结构约束。特别,我们将在编码器的瓶颈层中使用 Tensor环结构,利用结构数据集的软标签。我们经验证了该方法在填充和去噪应用中的的有效性。结果方法相对于其他基于生成先验的自监督恢复方法来说,实现了更好的重建质量。
https://arxiv.org/abs/2303.06235
We study a deep linear network endowed with a structure. It takes the form of a matrix $X$ obtained by multiplying $K$ matrices (called factors and corresponding to the action of the layers). The action of each layer (i.e. a factor) is obtained by applying a fixed linear operator to a vector of parameters satisfying a constraint. The number of layers is not limited. Assuming that $X$ is given and factors have been estimated, the error between the product of the estimated factors and $X$ (i.e. the reconstruction error) is either the statistical or the empirical risk. In this paper, we provide necessary and sufficient conditions on the network topology under which a stability property holds. The stability property requires that the error on the parameters defining the factors (i.e. the stability of the recovered parameters) scales linearly with the reconstruction error (i.e. the risk). Therefore, under these conditions on the network topology, any successful learning task leads to stably defined features and therefore interpretable layers/network.In order to do so, we first evaluate how the Segre embedding and its inverse distort distances. Then, we show that any deep structured linear network can be cast as a generic multilinear problem (that uses the Segre embedding). This is the {\em tensorial lifting}. Using the tensorial lifting, we provide necessary and sufficient conditions for the identifiability of the factors (up to a scale rearrangement). We finally provide the necessary and sufficient condition called \NSPlong~(because of the analogy with the usual Null Space Property in the compressed sensing framework) which guarantees that the stability property holds. We illustrate the theory with a practical example where the deep structured linear network is a convolutional linear network. As expected, the conditions are rather strong but not empty. A simple test on the network topology can be implemented to test if the condition holds.
我们研究了一个具有结构的深度学习线性网络。它的形式是 $K$ 个矩阵的乘积得到的矩阵 $X$,即称为 factors 并对应每个层的动作。每个层的动作(即一个 factors)是通过应用一个固定的线性操作对满足约束的参数向量进行计算得到的。层的数量不受限制。假设 $X$ 已经给定,并已经估计了 factors,则估计的 factors 与 $X$ 的乘积之差(即重建误差)可能是统计风险或经验风险。在本文中,我们提供了在网络拓扑下保持稳定性的必要条件和充分条件。稳定性性质要求定义 factors 的参数误差(即恢复参数的稳定性)以线性方式与重建误差(即风险)成正比。因此,在这些网络拓扑下,任何成功的学习任务都会导致稳定定义的特征,因此可解释的层/网络。为了这样做,我们首先评估了史格雷嵌入和它的逆对距离的影响。然后,我们表明,任何具有深刻结构深度学习线性网络都可以转换为一个通用多线性问题(使用史格雷嵌入)。这就是 Tensorial lifting。使用 Tensorial lifting,我们提供了 factors 的可确定性的必要条件和充分条件(到规模重构)。最后,我们提供了称为 NSPlong~(因为与压缩感知框架中的常见 null 空间 property 的类比)的必要和充分条件,以确保稳定性性质成立。我们使用实际示例来阐明理论,其中深刻的结构深度学习线性网络是一个卷积线性网络。正如预期的那样,条件相当强,但不是空的。可以通过网络拓扑的简单测试来测试条件是否成立。
https://arxiv.org/abs/1703.08044