We propose a deep probabilistic unfolding model to address the classical quantized compressive sensing problem that leverages an unfolding framework to enhance the reconstruction accuracy and efficiency. Unlike previous unfolding methods that apply L2 projection to measurements, we derive a closed-form, numerically stable likelihood gradient projection, which allows the model to respect the true quantization physics, turning the hard quantization constraint into a soft probabilistic guidance. Furthermore, an efficient, dual-domain Mamba module is specifically designed to dynamically capture and fuse the multi-scale local and global features, ensuring the interactions between the distant but correlated regions. Extensive experiments demonstrate the state-of-the-art performance of the proposed method over previous works, which is capable of promoting the application of quantized compressive sensing in real life.
https://arxiv.org/abs/2605.11475
Recent Deep Unfolding Networks (DUNs) have significantly advanced Compressive Sensing (CS) by integrating iterative optimization with deep networks. However, existing DUNs still suffer from two challenges: 1) Reliance on a single measurement stream, which limits effective information interaction across distinct measurement subsets. 2) Uniform processing of all image regions, which overlooks varying reconstruction difficulties induced by diverse textures. To address these limitations, a novel Dual-Path Hyperprior Informed Deep Unfolding Network (DPH-DUN) is proposed, which partitions measurements into double subsets to enable hyperprior-guided reconstruction via a dual-path architecture. In the Deep Hyperprior Learning branch, a series of lightweight neural modules are designed to efficiently generate hyperprior knowledge of different domains, enabling collaborative guidance for the CS reconstruction. In the Hyperprior Informed Reconstruction branch, a deep unfolding framework with hyperprior guidance is constructed to iteratively refine reconstruction. Specifically, i) in the gradient descent step, a Hyperprior Informed Step Size Generation network is designed to dynamically generate spatially varying step maps, enabling adaptive fine-grained gradient updates. ii) In the proximal mapping step, two well-designed hyperprior informed attention mechanisms are introduced to dynamically focus on challenging regions via gradient-based hard and soft attentions, facilitating CS reconstruction accuracy. Extensive experiments demonstrate that the proposed DPH-DUN outperforms existing CS methods.
https://arxiv.org/abs/2605.09566
Modern IoT and sensor networks generate vast amounts of data, posing significant challenges for storage, transmission, and real-time processing. Traditional approaches, such as compressive sensing and machine learning-based compression, often suffer from computational inefficiencies and irreversible data loss. This paper introduces Information Density as a quantitative metric to support sensor deployment and enable AI-driven virtual sensing. We propose a framework that leverages spatial, temporal and inter-modal correlations among sensor signals to perform sensing tasks even in the absence of physical sensors. Two complementary measures: (i) Phase in Eigen Space and (ii) Mutual Information, are developed to quantify and assess information density, enabling the selection of optimal sensor configurations across both intra-modality and cross-modality scenarios. Validated using real-world data from Madrid's smart city infrastructure, this framework demonstrates the feasibility of replacing physical sensors with virtual ones under bounded error conditions (e.g., achieving $<3.21\%$ mean error with a single sensor). The results highlight the potential for scalable and energy-efficient sensing systems in smart environments.
https://arxiv.org/abs/2605.08180
Single-pixel imaging (SPI) is a promising imaging modality with distinctive advantages in strongly perturbed environments. Existing SPI methods lack physical sparsity constraints and overlook the integration of local and global features, leading to severe noise vulnerability, structural distortions and blurred details. To address these limitations, we propose SISTA-Net, a compressive sensing-inspired self-supervised method for single-pixel imaging. SISTA-Net unfolds the Iterative Shrinkage-Thresholding Algorithm (ISTA) into an interpretable network consisting of a data fidelity module and a proximal mapping module. The fidelity module adopts a hybrid CNN-Visual State Space Model (VSSM) architecture to integrate local and global feature modeling, enhancing reconstruction integrity and fidelity. We leverage deep nonlinear networks as adaptive sparse transforms combined with a learnable soft-thresholding operator to impose explicit physical sparsity in the latent domain, enabling noise suppression and robustness to interference even at extremely low sampling rates. Extensive experiments on multiple simulation scenarios demonstrate that SISTA-Net outperforms state-of-the-art methods by 2.6 dB in PSNR. Real-world far-field underwater tests yield a 3.4 dB average PSNR improvement, validating its robust anti-interference capability.
https://arxiv.org/abs/2603.29732
Deep unfolding networks (DUNs), combining conventional iterative optimization algorithms and deep neural networks into a multi-stage framework, have achieved remarkable accomplishments in Image Restoration (IR), such as spectral imaging reconstruction, compressive sensing and this http URL unfolds the iterative optimization steps into a stack of sequentially linked this http URL block consists of a Gradient Descent Module (GDM) and a Proximal Mapping Module (PMM) which is equivalent to a denoiser from a Bayesian perspective, operating on Gaussian noise with a known this http URL, existing DUNs suffer from two critical limitations: (i) their PMMs share identical architectures and denoising objectives across stages, ignoring the need for stage-specific adaptation to varying noise levels; and (ii) their chain of structurally repetitive blocks results in severe parameter redundancy and high memory consumption, hindering deployment in large-scale or resource-constrained this http URL address these challenges, we introduce generalized Deep Low-rank Adaptation (LoRA) Unfolding Networks for image restoration, named LoRun, harmonizing denoising objectives and adapting different denoising levels between stages with compressed memory usage for more efficient this http URL introduces a novel paradigm where a single pretrained base denoiser is shared across all stages, while lightweight, stage-specific LoRA adapters are injected into the PMMs to dynamically modulate denoising behavior according to the noise level at each unfolding this http URL design decouples the core restoration capability from task-specific adaptation, enabling precise control over denoising intensity without duplicating full network parameters and achieving up to $N$ times parameter reduction for an $N$-stage DUN with on-par or better this http URL experiments conducted on three IR tasks validate the efficiency of our method.
结合传统迭代优化算法和深度神经网络的深层展开网络(DUNs)在图像恢复领域,如光谱成像重建、压缩感知等任务中取得了显著成就。这类方法通过将迭代步骤拆解为一系列顺序连接的模块化块来工作,每个块包含一个梯度下降模块(GDM)和一个临近映射模块(PMM),后者从贝叶斯视角来看相当于一种去噪器,针对已知高斯噪声进行操作。然而,现有的DUNs面临两个关键限制:(i)它们的PMM在各阶段共享相同的架构及去噪目标,忽视了根据不同阶段所需的特定适应性调整;(ii)重复结构导致严重的参数冗余和高内存消耗,阻碍其在大规模或资源受限环境中的部署。 为解决这些问题,我们提出了广义深度低秩适配(LoRA)展开网络用于图像恢复,命名为LoRun。该方法通过压缩的内存使用量实现了不同阶段去噪目标之间的协调及适应不同的噪声水平,并且提高了效率。这种方法引入了一种新颖的方法:在整个过程中共享一个预训练的基本去噪器,同时在每个PMM中插入轻量级、阶段特定的LoRA适配器来根据展开步骤中的噪声级别动态调节去噪行为。 该设计解耦了核心恢复能力与任务特定适应性调整,从而能够在不复制完整网络参数的情况下实现对去噪强度的精确控制,并且最多可以将N个阶段DUN的参数减少到原来的1/N倍,在保持性能不变或更好的情况下。通过在三个图像恢复任务上的实验验证了我们方法的有效性。
https://arxiv.org/abs/2602.18697
Deep learning based compressive sensing (CS) methods typically learn sampling operators using convolutional or block wise fully connected layers, which limit receptive fields and scale poorly for high dimensional data. We propose MTSCSNet, a CS framework based on Multiscale Tensor Summation (MTS) factorization, a structured operator for efficient multidimensional signal processing. MTS performs mode-wise linear transformations with multiscale summation, enabling large receptive fields and effective modeling of cross-dimensional correlations. In MTSCSNet, MTS is first used as a learnable CS operator that performs linear dimensionality reduction in tensor space, with its adjoint defining the initial back-projection, and is then applied in the reconstruction stage to directly refine this estimate. This results in a simple feed-forward architecture without iterative or proximal optimization, while remaining parameter and computation efficient. Experiments on standard CS benchmarks show that MTSCSNet achieves state-of-the-art reconstruction performance on RGB images, with notable PSNR gains and faster inference, even compared to recent diffusion-based CS methods, while using a significantly more compact feed-forward architecture.
基于深度学习的压缩感知(CS)方法通常使用卷积层或分块全连接层来学习采样算子,这些方法会限制感受野并导致在高维数据上的性能不佳。我们提出了MTSCSNet框架,该框架基于多尺度张量求和(MTS)分解,这是一种用于高效处理多维信号的结构化操作符。MTS通过模式级线性变换与多尺度求和实现大范围的感受野,并能有效建模跨维度的相关性。 在MTSCSNet中,首先使用MTS作为可学习的CS算子,在张量空间执行线性降维处理,其伴随算子定义了初始反投影。然后,在重建阶段应用MTS直接精炼该估计值。这一过程构建了一个简单的前馈架构,无需迭代或近似优化,并且保持参数和计算效率。 实验结果表明,基于标准CS基准测试的MTSCSNet在RGB图像重构性能上取得了当前最佳的结果,展示了显著的PSNR增益和更快的推理速度,即使与最近基于扩散的CS方法相比也表现出色。同时,它使用了一个更为紧凑的前馈架构。
https://arxiv.org/abs/2602.07056
The rapid proliferation of realistic deepfakes has raised urgent concerns over their misuse, motivating the use of defensive watermarks in synthetic images for reliable detection and provenance tracking. However, this defense paradigm assumes such watermarks are inherently resistant to removal. We challenge this assumption with DeMark, a query-free black-box attack framework that targets defensive image watermarking schemes for deepfakes. DeMark exploits latent-space vulnerabilities in encoder-decoder watermarking models through a compressive sensing based sparsification process, suppressing watermark signals while preserving perceptual and structural realism appropriate for deepfakes. Across eight state-of-the-art watermarking schemes, DeMark reduces watermark detection accuracy from 100% to 32.9% on average while maintaining natural visual quality, outperforming existing attacks. We further evaluate three defense strategies, including image super resolution, sparse watermarking, and adversarial training, and find them largely ineffective. These results demonstrate that current encoder decoder watermarking schemes remain vulnerable to latent-space manipulations, underscoring the need for more robust watermarking methods to safeguard against deepfakes.
https://arxiv.org/abs/2601.16473
Sparse regularization is fundamental in signal processing and feature extraction but often relies on non-differentiable penalties, conflicting with gradient-based optimizers. We propose WEEP (Weakly-convex Envelope of Piecewise Penalty), a novel differentiable regularizer derived from the weakly-convex envelope framework. WEEP provides tunable, unbiased sparsity and a simple closed-form proximal operator, while maintaining full differentiability and L-smoothness, ensuring compatibility with both gradient-based and proximal algorithms. This resolves the tradeoff between statistical performance and computational tractability. We demonstrate superior performance compared to established convex and non-convex sparse regularizers on challenging compressive sensing and image denoising tasks.
稀疏正则化在信号处理和特征提取中至关重要,但通常依赖于非可微分的惩罚项,这与基于梯度的优化器相冲突。我们提出了WEEP(弱凸包络分段惩罚),这是一种新颖的可微正则化方法,源自弱凸包络框架。WEEP提供了一种可调、无偏的稀疏性,并具有简单的闭式形式近端操作符,同时保持完全可微性和L-光滑性,确保了与基于梯度和基于近端算法的兼容性。这解决了统计性能与计算可行性之间的权衡问题。我们在具有挑战性的压缩感知和图像去噪任务中展示了WEEP相较于现有的凸和非凸稀疏正则化方法的优越性能。
https://arxiv.org/abs/2507.20447
Hyperspectral imaging (HSI) is essential across various disciplines for its capacity to capture rich spectral information. However, efficiently reconstructing hyperspectral images from compressive sensing measurements presents significant challenges. To tackle these, we adopt a divide-and-conquer strategy that capitalizes on the unique spectral and spatial characteristics of hyperspectral images. We introduce the Lightweight Separate Spectral Transformer (LSST), an innovative architecture tailored for efficient hyperspectral image reconstruction. This architecture consists of Separate Spectral Transformer Blocks (SSTB) for modeling spectral relationships and Lightweight Spatial Convolution Blocks (LSCB) for spatial processing. The SSTB employs Grouped Spectral Self-attention and a Spectrum Shuffle operation to effectively manage both local and non-local spectral relationships. Simultaneously, the LSCB utilizes depth-wise separable convolutions and strategic ordering to enhance spatial information processing. Furthermore, we implement the Focal Spectrum Loss, a novel loss weighting mechanism that dynamically adjusts during training to improve reconstruction across spectrally complex bands. Extensive testing demonstrates that our LSST achieves superior performance while requiring fewer FLOPs and parameters, underscoring its efficiency and effectiveness. The source code is available at: this https URL.
高光谱成像(HSI)因其能够捕获丰富的光谱信息而在多个学科中至关重要。然而,从压缩感知测量高效重建高光谱图像面临重大挑战。为解决这些问题,我们采用了一种分而治之的策略,利用了高光谱图像独特的光谱和空间特性。为此,我们引入了轻量级独立光谱变压器(LSST),这是一种创新架构,专为高效的高光谱图像重建设计。 该架构包括用于建模光谱关系的独立光谱变换模块(SSTB)以及用于处理空间信息的轻量化空间卷积块(LSCB)。SSTB 采用分组光谱自注意力机制和光谱混洗操作,有效管理局部和非局部光谱关系。同时,LSCB 使用深度可分离卷积并采取战略性排列以增强空间信息处理。 此外,我们实施了焦点光谱损失(Focal Spectrum Loss),这是一种新颖的损失加权机制,在训练过程中动态调整,旨在改善复杂光谱带中的重建效果。广泛的测试表明,我们的 LSST 实现了卓越的性能,同时需要较少的浮点运算数(FLOPs)和参数,突显其高效性和有效性。源代码可在以下网址获取:[提供链接的地方,请根据实际情况插入正确的URL]。
https://arxiv.org/abs/2601.01064
We present Primal, a deterministic feature mapping framework that harnesses the number-theoretic independence of prime square roots to construct robust, tunable vector representations. Diverging from standard stochastic projections (e.g., Random Fourier Features), our method exploits the Besicovitch property to create irrational frequency modulations that guarantee infinite non-repeating phase trajectories. We formalize two distinct algorithmic variants: (1) StaticPrime, a sequence generation method that produces temporal position encodings empirically approaching the theoretical Welch bound for quasi-orthogonality; and (2) DynamicPrime, a tunable projection layer for input-dependent feature mapping. A central novelty of the dynamic framework is its ability to unify two disparate mathematical utility classes through a single scaling parameter {\sigma}. In the low-frequency regime, the method acts as an isometric kernel map, effectively linearizing non-convex geometries (e.g., spirals) to enable high-fidelity signal reconstruction and compressive sensing. Conversely, the high-frequency regime induces chaotic phase wrapping, transforming the projection into a maximum-entropy one-way hash suitable for Hyperdimensional Computing and privacy-preserving Split Learning. Empirical evaluations demonstrate that our framework yields superior orthogonality retention and distribution tightness compared to normalized Gaussian baselines, establishing it as a computationally efficient, mathematically rigorous alternative to random matrix projections. The code is available at this https URL
https://arxiv.org/abs/2511.20839
Cardiac contraction is a rapid, coordinated process that unfolds across three-dimensional tissue on millisecond timescales. Traditional optical imaging is often inadequate for capturing dynamic cellular structure in the beating heart because of a fundamental trade-off between spatial and temporal resolution. To overcome these limitations, we propose a high-performance computational imaging framework that integrates Compressive Sensing (CS) with Light-Sheet Microscopy (LSM) for efficient, low-phototoxic cardiac imaging. The system performs compressed acquisition of fluorescence signals via random binary mask coding using a Digital Micromirror Device (DMD). We propose a Plug-and-Play (PnP) framework, solved using the alternating direction method of multipliers (ADMM), which flexibly incorporates advanced denoisers, including Tikhonov, Total Variation (TV), and BM3D. To preserve structural continuity in dynamic imaging, we further introduce temporal regularization enforcing smoothness between adjacent z-slices. Experimental results on zebrafish heart imaging under high compression ratios demonstrate that the proposed method successfully reconstructs cellular structures with excellent denoising performance and image clarity, validating the effectiveness and robustness of our algorithm in real-world high-speed, low-light biological imaging scenarios.
https://arxiv.org/abs/2511.03093
Adversarial attacks present a significant threat to modern machine learning systems. Yet, existing detection methods often lack the ability to detect unseen attacks or detect different attack types with a high level of accuracy. In this work, we propose a statistical approach that establishes a detection baseline before a neural network's deployment, enabling effective real-time adversarial detection. We generate a metric of adversarial presence by comparing the behavior of a compressed/uncompressed neural network pair. Our method has been tested against state-of-the-art techniques, and it achieves near-perfect detection across a wide range of attack types. Moreover, it significantly reduces false positives, making it both reliable and practical for real-world applications.
对抗性攻击对现代机器学习系统构成了重大威胁。然而,现有的检测方法通常缺乏检测未见过的攻击或以高精度检测不同类型的攻击的能力。在这项工作中,我们提出了一种统计方法,在神经网络部署之前建立一个检测基线,从而能够进行有效的实时对抗性检测。通过比较压缩和未压缩的一对神经网络的行为,我们生成了一个衡量对抗性存在的指标。我们的方法已经经过最先进的技术测试,并在广泛的攻击类型中实现了近乎完美的检测率。此外,它大大减少了误报,使其既可靠又实用,适合实际应用。
https://arxiv.org/abs/2510.02707
Hyperspectral imaging (HSI) provides rich spatial-spectral information but remains costly to acquire due to hardware limitations and the difficulty of reconstructing three-dimensional data from compressed measurements. Although compressive sensing systems such as CASSI improve efficiency, accurate reconstruction is still challenged by severe degradation and loss of fine spectral details. We propose the Flow-Matching-guided Unfolding network (FMU), which, to our knowledge, is the first to integrate flow matching into HSI reconstruction by embedding its generative prior within a deep unfolding framework. To further strengthen the learned dynamics, we introduce a mean velocity loss that enforces global consistency of the flow, leading to a more robust and accurate reconstruction. This hybrid design leverages the interpretability of optimization-based methods and the generative capacity of flow matching. Extensive experiments on both simulated and real datasets show that FMU significantly outperforms existing approaches in reconstruction quality. Code and models will be available at this https URL.
高光谱成像(HSI)提供了丰富的空间-光谱信息,但由于硬件限制和从压缩测量中重构三维数据的难度,其获取成本仍然较高。尽管诸如CASSI之类的压缩感知系统提高了效率,但准确重建仍受到严重退化和精细光谱细节丢失的挑战。我们提出了Flow-Matching指导下的展层网络(FMU),据我们所知,这是首次将流匹配集成到HSI重构中,在深度解卷积框架内嵌入其生成先验的方法。为了进一步增强学习的动力学,我们引入了平均速度损失,以强制执行流的全局一致性,从而实现更稳健和准确的重建。这种混合设计利用了基于优化方法的可解释性和流匹配的生成能力。在仿真数据集和真实数据集上的广泛实验表明,FMU在重构质量上显著优于现有方法。代码和模型将在以下网址提供:[此链接](请将方括号内的文本替换为实际提供的URL)。
https://arxiv.org/abs/2510.01912
Imaging inverse problems aims to recover high-dimensional signals from undersampled, noisy measurements, a fundamentally ill-posed task with infinite solutions in the null-space of the sensing operator. To resolve this ambiguity, prior information is typically incorporated through handcrafted regularizers or learned models that constrain the solution space. However, these priors typically ignore the task-specific structure of that null-space. In this work, we propose \textit{Non-Linear Projections of the Null-Space} (NPN), a novel class of regularization that, instead of enforcing structural constraints in the image domain, promotes solutions that lie in a low-dimensional projection of the sensing matrix's null-space with a neural network. Our approach has two key advantages: (1) Interpretability: by focusing on the structure of the null-space, we design sensing-matrix-specific priors that capture information orthogonal to the signal components that are fundamentally blind to the sensing process. (2) Flexibility: NPN is adaptable to various inverse problems, compatible with existing reconstruction frameworks, and complementary to conventional image-domain priors. We provide theoretical guarantees on convergence and reconstruction accuracy when used within plug-and-play methods. Empirical results across diverse sensing matrices demonstrate that NPN priors consistently enhance reconstruction fidelity in various imaging inverse problems, such as compressive sensing, deblurring, super-resolution, computed tomography, and magnetic resonance imaging, with plug-and-play methods, unrolling networks, deep image prior, and diffusion models.
图像逆问题的目标是从欠采样且含有噪声的测量中恢复高维信号,这是一个本质上无法确定的问题,在感知算子的零空间中有无限多个解。为了消除这种不确定性,通常通过手工设计的正则化器或学习模型来整合先验信息以约束解的空间范围。然而,这些先验信息往往忽略了该零空间中的任务特定结构。在本文中,我们提出了“非线性零空间投影”(NPN),这是一种新的正则化方法,它不通过图像域施加结构性限制,而是利用神经网络来促进那些位于感知矩阵的低维零空间投影上的解。我们的方法有两大优势: 1. **可解释性**:通过关注零空间结构,我们设计了与感知过程本质盲目的信号成分正交的信息特异于感知矩阵的先验。 2. **灵活性**:NPN能够适应各种逆问题,可以与现有的重建框架兼容,并且是对传统图像域先验的一种补充。我们在插件式方法中提供了关于收敛性和重构准确性的理论保证。 实验结果表明,在多种不同的感知矩阵下,NPN在压缩感知、去模糊化、超分辨率、计算机断层扫描和磁共振成像等各类成像逆问题中的重构保真度上始终有所提升。这些改进适用于插件式方法、展卷网络、深度图像先验以及扩散模型等多种技术框架内。
https://arxiv.org/abs/2510.01608
Self-supervised learning for inverse problems allows to train a reconstruction network from noise and/or incomplete data alone. These methods have the potential of enabling learning-based solutions when obtaining ground-truth references for training is expensive or even impossible. In this paper, we propose a new self-supervised learning strategy devised for the challenging setting where measurements are observed via a single incomplete observation model. We introduce a new definition of equivariance in the context of reconstruction networks, and show that the combination of self-supervised splitting losses and equivariant reconstruction networks results in unbiased estimates of the supervised loss. Through a series of experiments on image inpainting, accelerated magnetic resonance imaging, and compressive sensing, we demonstrate that the proposed loss achieves state-of-the-art performance in settings with highly rank-deficient forward models.
自监督学习在逆问题中的应用使得仅通过噪声和/或不完整数据来训练重建网络成为可能。这些方法有望实现基于学习的解决方案,特别是在获取用于训练的真实参考数据昂贵甚至不可能的情况下。本文中,我们提出了一种新的针对测量值仅能通过单一不完整的观察模型获得这一挑战性设置下的自监督学习策略。我们在重建网络背景下引入了等变性的新定义,并证明了结合自我监督分裂损失和等变重建网络可以得到无偏的监督损失估计。通过对图像修复、加速磁共振成像以及压缩感知进行一系列实验,我们展示了所提出的损失函数在具有高度秩亏前向模型设置中达到了最先进的性能。
https://arxiv.org/abs/2510.00929
Spectral imaging technology has long-faced fundamental challenges in balancing spectral, spatial, and temporal reso- lutions. While compressive sensing-based Coded Aperture Snapshot Spectral Imaging (CASSI) mitigates this trade-off through optical encoding, high compression ratios result in ill-posed reconstruction problems. Traditional model-based methods exhibit limited performance due to reliance on handcrafted inherent image priors, while deep learning approaches are constrained by their black-box nature, which compromises physical interpretability. To address these limitations, we propose a dual-camera CASSI reconstruction framework that integrates total variation (TV) subgradient theory. By es- tablishing an end-to-end SD-CASSI mathematical model, we reduce the computational complexity of solving the inverse problem and provide a mathematically well-founded framework for analyzing multi-camera systems. A dynamic regular- ization strategy is introduced, incorporating normalized gradient constraints from RGB/panchromatic-derived reference images, which constructs a TV subgradient similarity function with strict convex optimization guarantees. Leveraging spatial priors from auxiliary cameras, an adaptive reference generation and updating mechanism is designed to provide subgradient guidance. Experimental results demonstrate that the proposed method effectively preserves spatial-spectral structural consistency. The theoretical framework establishes an interpretable mathematical foundation for computational spectral imaging, demonstrating robust performance across diverse reconstruction scenarios. The source code is available at this https URL.
光谱成像技术长期以来在平衡光谱、空间和时间分辨率方面面临基本挑战。虽然基于压缩感知的编码孔径快照光谱成像(CASSI)通过光学编码缓解了这种权衡,但高压缩比导致重建问题变得不适定。传统模型驱动的方法由于依赖于手工设计的先验图像特征表现出有限性能,而深度学习方法则受限于其黑箱特性,从而牺牲了物理可解释性。为了解决这些局限性,我们提出了一种双摄像头CASSI重建框架,集成了全变差(TV)次梯度理论。通过建立端到端SD-CASSI数学模型,我们减少了求解逆问题的计算复杂度,并提供了一个分析多相机系统的数学基础框架。引入了动态正则化策略,该策略结合来自RGB/全色图像导出参考图象的归一化梯度约束,构建具有严格凸优化保证的TV次梯度相似函数。利用辅助摄像头的空间先验信息,设计了一种自适应参考生成和更新机制以提供次梯度指导。实验结果表明所提出的方法在保持空间-光谱结构一致性方面效果显著。理论框架为计算光谱成像建立了可解释的数学基础,并且在各种重建场景中表现出强大的性能。源代码可在该链接获取:[这里插入链接]。
https://arxiv.org/abs/2509.10897
We explore the connection between Plug-and-Play (PnP) methods and Denoising Diffusion Implicit Models (DDIM) for solving ill-posed inverse problems, with a focus on single-pixel imaging. We begin by identifying key distinctions between PnP and diffusion models-particularly in their denoising mechanisms and sampling procedures. By decoupling the diffusion process into three interpretable stages: denoising, data consistency enforcement, and sampling, we provide a unified framework that integrates learned priors with physical forward models in a principled manner. Building upon this insight, we propose a hybrid data-consistency module that linearly combines multiple PnP-style fidelity terms. This hybrid correction is applied directly to the denoised estimate, improving measurement consistency without disrupting the diffusion sampling trajectory. Experimental results on single-pixel imaging tasks demonstrate that our method achieves better reconstruction quality.
我们探讨了插件即玩(Plug-and-Play,PnP)方法与去噪扩散隐式模型(Denoising Diffusion Implicit Models,DDIM)在解决病态逆问题中的联系,特别是单像素成像。首先,我们识别出PnP和扩散模型之间的一些关键区别,特别是在它们的去噪机制和采样程序方面。通过将扩散过程分解为三个可解释的阶段:去噪、数据一致性执行和采样,我们提供了一个统一的框架,在这个框架中以一种原则性的方式整合了学习到的先验与物理前向模型。在此洞察的基础上,我们提出了一种混合的数据一致性模块,该模块通过线性组合多个PnP风格的一致性项来直接应用于去噪估计上。这种混合校正是在不干扰扩散采样轨迹的情况下提高了测量一致性。实验结果表明,在单像素成像任务中,我们的方法实现了更好的重建质量。
https://arxiv.org/abs/2509.09365
Digital cameras consume ~0.1 microjoule per pixel to capture and encode video, resulting in a power usage of ~20W for a 4K sensor operating at 30 fps. Imagining gigapixel cameras operating at 100-1000 fps, the current processing model is unsustainable. To address this, physical layer compressive measurement has been proposed to reduce power consumption per pixel by 10-100X. Video Snapshot Compressive Imaging (SCI) introduces high frequency modulation in the optical sensor layer to increase effective frame rate. A commonly used sampling strategy of video SCI is Random Sampling (RS) where each mask element value is randomly set to be 0 or 1. Similarly, image inpainting (I2P) has demonstrated that images can be recovered from a fraction of the image pixels. Inspired by I2P, we propose Ultra-Sparse Sampling (USS) regime, where at each spatial location, only one sub-frame is set to 1 and all others are set to 0. We then build a Digital Micro-mirror Device (DMD) encoding system to verify the effectiveness of our USS strategy. Ideally, we can decompose the USS measurement into sub-measurements for which we can utilize I2P algorithms to recover high-speed frames. However, due to the mismatch between the DMD and CCD, the USS measurement cannot be perfectly decomposed. To this end, we propose BSTFormer, a sparse TransFormer that utilizes local Block attention, global Sparse attention, and global Temporal attention to exploit the sparsity of the USS measurement. Extensive results on both simulated and real-world data show that our method significantly outperforms all previous state-of-the-art algorithms. Additionally, an essential advantage of the USS strategy is its higher dynamic range than that of the RS strategy. Finally, from the application perspective, the USS strategy is a good choice to implement a complete video SCI system on chip due to its fixed exposure time.
数字相机每捕获和编码一个视频像素大约消耗0.1微焦耳的能量,这意味着对于30帧/秒运行的4K传感器来说,其功耗约为20瓦。设想一下千兆像素级别的摄像机以100到1000帧/秒的速度工作时,现有的处理模型显然不可持续。为了解决这个问题,人们提出了在物理层面上采用压缩测量来将每个像素的能量消耗减少10至100倍的方案。 视频快照压缩成像(Video Snapshot Compressive Imaging, SCI)通过在光学传感器层面引入高频调制技术,提高了有效帧率。一种常见的视频SCI采样策略是随机采样(Random Sampling, RS),其中每个掩模元素值被随机设置为0或1。同样地,图像修复(Image Inpainting, I2P)展示了仅从一部分像素中就可以恢复出完整图像的可能性。 受到I2P的启发,我们提出了一种超稀疏采样(Ultra-Sparse Sampling, USS)策略,在这一策略下,每个空间位置只有一个子帧被设置为1,其余所有都设为0。随后,我们构建了一个基于数字微镜设备(Digital Micromirror Device, DMD)的编码系统来验证USS策略的有效性。理想情况下,我们可以将USS测量分解成子测量,并利用I2P算法恢复高速图像帧。然而,由于DMD和CCD之间的不匹配问题,USS测量无法完美地被分解。 为此,我们提出了BSTFormer(Block、Sparse、Temporal Transformer),一种稀疏Transformer,它使用局部块注意机制、全局稀疏注意机制以及全局时间注意机制来利用USS测量的稀疏性。在模拟和真实世界数据集上的广泛实验结果表明,我们的方法显著优于所有先前的最佳算法。 此外,USS策略的一个关键优势是其动态范围比RS策略更大。从应用角度来看,由于固定曝光时间的特点,USS策略是一个实施完整视频SCI系统芯片的良好选择。
https://arxiv.org/abs/2509.08228
Deep networks have achieved remarkable success in image compressed sensing (CS) task, namely reconstructing a high-fidelity image from its compressed measurement. However, existing works are deficient inincoherent compressed measurement at sensing phase and implicit measurement representations at reconstruction phase, limiting the overall performance. In this work, we answer two questions: 1) how to improve the measurement incoherence for decreasing the ill-posedness; 2) how to learn informative representations from measurements. To this end, we propose a novel asymmetric Kronecker CS (AKCS) model and theoretically present its better incoherence than previous Kronecker CS with minimal complexity increase. Moreover, we reveal that the unfolding networks' superiority over non-unfolding ones result from sufficient gradient descents, called explicit measurement representations. We propose a measurement-aware cross attention (MACA) mechanism to learn implicit measurement representations. We integrate AKCS and MACA into widely-used unfolding architecture to get a measurement-enhanced unfolding network (MEUNet). Extensive experiences demonstrate that our MEUNet achieves state-of-the-art performance in reconstruction accuracy and inference speed.
深度网络在图像压缩感知(CS)任务中取得了显著的成功,即从其压缩测量值重构出高质量的图像。然而,现有的研究工作在传感阶段缺乏不相干的压缩测量,在重建阶段也未能明确表示测量结果,这限制了整体性能的表现。本文旨在回答两个问题:1)如何改进测量的不相关性以减少不适定性;2)如何从测量数据中学习出具有信息量的表示。 为此,我们提出了一个新颖的非对称克罗内克压缩感知(AKCS)模型,并从理论上证明了其比之前的克罗内克压缩感知方法在复杂度最小增加的情况下具备更好的不相关性。此外,我们揭示了解折叠网络优于非解折叠网络的原因在于充分的梯度下降过程,即显式的测量表示。 为了学习隐含的测量表示,我们提出了一个基于测量的认知交叉注意力(MACA)机制。我们将AKCS和MACA整合到广泛使用的解折叠架构中,得到了一种增强测量信息的解折叠网络(MEUNet)。大量的实验经验表明,我们的MEUNet在重建精度和推理速度方面达到了最先进的性能水平。
https://arxiv.org/abs/2508.09528
Sound speed profiles (SSPs) are essential parameters underwater that affects the propagation mode of underwater signals and has a critical impact on the energy efficiency of underwater acoustic communication and accuracy of underwater acoustic positioning. Traditionally, SSPs can be obtained by matching field processing (MFP), compressive sensing (CS), and deep learning (DL) methods. However, existing methods mainly rely on on-site underwater sonar observation data, which put forward strict requirements on the deployment of sonar observation systems. To achieve high-precision estimation of sound velocity distribution in a given sea area without on-site underwater data measurement, we propose a multi-modal data-fusion generative adversarial network model with residual attention block (MDF-RAGAN) for SSP construction. To improve the model's ability for capturing global spatial feature correlations, we embedded the attention mechanisms, and use residual modules for deeply capturing small disturbances in the deep ocean sound velocity distribution caused by changes of SST. Experimental results on real open dataset show that the proposed model outperforms other state-of-the-art methods, which achieves an accuracy with an error of less than 0.3m/s. Specifically, MDF-RAGAN not only outperforms convolutional neural network (CNN) and spatial interpolation (SITP) by nearly a factor of two, but also achieves about 65.8\% root mean square error (RMSE) reduction compared to mean profile, which fully reflects the enhancement of overall profile matching by multi-source fusion and cross-modal attention.
声速剖面(SSPs)是水下环境中的关键参数,它影响着水下信号的传播模式,并对水下声学通信的能量效率和水下声定位的准确性有着至关重要的影响。传统上,可以通过现场处理匹配(MFP)、压缩感知(CS)以及深度学习(DL)方法来获取SSPs。然而,现有的大多数方法主要依赖于现场水下声呐观测数据,这对声呐观测系统的部署提出了严格的要求。为了在不进行现场测量的情况下,在给定海域内实现高精度的声速分布估计,我们提出了一种多模态数据融合对抗生成网络模型(MDF-RAGAN),该模型采用残差注意力模块构建SSP。为增强模型捕捉全局空间特征相关性的能力,我们在模型中嵌入了注意机制,并使用残差模块来深入捕获由于海表温度变化而引起的深海水声速分布的微小扰动。 在真实开放数据集上的实验结果表明,所提出的模型优于其他最先进的方法,在误差小于0.3米/秒的情况下达到了更高的精度。具体而言,MDF-RAGAN不仅比卷积神经网络(CNN)和空间插值法(SITP)高出近两倍的表现,还与平均剖面相比减少了约65.8%的均方根误差(RMSE),这充分反映了多源融合及跨模态注意力对整体剖面对齐增强的效果。
https://arxiv.org/abs/2507.11812