Purpose: To develop and validate a novel image reconstruction technique using implicit neural representations (INR) for multi-view thick-slice acquisitions while reducing the scan time but maintaining high signal-to-noise ratio (SNR). Methods: We propose Rotating-view super-resolution (ROVER)-MRI, an unsupervised neural network-based algorithm designed to reconstruct MRI data from multi-view thick slices, effectively reducing scan time by 2-fold while maintaining fine anatomical details. We compare our method to both bicubic interpolation and the current state-of-the-art regularized least-squares super-resolution reconstruction (LS-SRR) technique. Validation is performed using ground-truth ex-vivo monkey brain data, and we demonstrate superior reconstruction quality across several in-vivo human datasets. Notably, we achieve the reconstruction of a whole human brain in-vivo T2-weighted image with an unprecedented 180{\mu}m isotropic spatial resolution, accomplished in just 17 minutes of scan time on a 7T MRI scanner. Results: ROVER-MRI outperformed LS-SRR method in terms of reconstruction quality with 22.4% lower relative error (RE) and 7.5% lower full-width half maximum (FWHM) indicating better preservation of fine structural details in nearly half the scan time. Conclusion: ROVER-MRI offers an efficient and robust approach for mesoscale MR imaging, enabling rapid, high-resolution whole-brain scans. Its versatility holds great promise for research applications requiring anatomical details and time-efficient imaging.
目的:开发并验证一种使用隐式神经表示(INR)的新图像重建技术,用于多视角厚层采集,在减少扫描时间的同时保持高信噪比(SNR)。 方法:我们提出了一种基于无监督神经网络的算法——旋转视图超分辨率成像(ROVER-MRI),该算法旨在从多视角厚层数据中重构MRI信息,并有效将扫描时间缩短一半,同时保留精细解剖细节。我们将这种方法与双三次插值和当前最先进的正则化最小二乘法超分辨率重建(LS-SRR)技术进行了比较。使用离体恒河猴脑的真实地面数据进行验证,我们在多个在体人类数据集中展示了更好的重建质量。值得注意的是,在7T MRI扫描仪上仅需17分钟的扫描时间,我们成功实现了整个在体人脑T2加权图像的重构,并达到了前所未有的等向性空间分辨率180μm。 结果:ROVER-MRI方法比LS-SRR方法在图像重建质量方面表现出色,相对误差(RE)降低了22.4%,全宽半高值(FWHM)降低了7.5%,表明其能够在几乎一半的扫描时间内更好地保留精细结构细节。 结论:ROVER-MRI为中尺度MRI成像提供了一种高效且稳健的方法,能够快速获取高分辨率全脑图像。它的多功能性对于需要解剖细节和时间效率的科研应用具有巨大的潜力。
https://arxiv.org/abs/2502.08634
Due to limitations of storage and bandwidth, videos stored and transmitted on the Internet are usually low-quality with low-resolution and compression noise. Although video super-resolution (VSR) is an efficient technique to enhance video resolution, relatively VSR methods focus on compressed videos. Directly applying general VSR approaches leads to the failure of improving practical videos, especially when frames are highly compressed at a low bit rate. Recently, diffusion models have achieved superior performance in low-level visual tasks, and their high-realism generation capability enables them to be applied in VSR. To synthesize more compression-lost details and refine temporal consistency, we propose a novel Spatial Degradation-Aware and Temporal Consistent (SDATC) diffusion model for compressed VSR. Specifically, we introduce a distortion Control module (DCM) to modulate diffusion model inputs and guide the generation. Next, the diffusion model executes the denoising process for texture generation with fine-tuned spatial prompt-based compression-aware module (PCAM) and spatio-temporal attention module (STAM). PCAM extracts features to encode specific compression information dynamically. STAM extends the spatial attention mechanism to a spatio-temporal dimension for capturing temporal correlation. Extensive experimental results on benchmark datasets demonstrate the effectiveness of the proposed modules in enhancing compressed videos.
由于存储和带宽的限制,互联网上存储和传输的视频通常是低质量、低分辨率且存在压缩噪声。虽然视频超分辨率(VSR)是一种有效的提高视频分辨率的技术,但现有的VSR方法主要针对的是经过压缩的视频。直接应用通用的VSR方法通常无法改善实际视频的质量,尤其是在比特率很低导致帧高度压缩的情况下效果更差。最近,扩散模型在低级视觉任务中表现出色,并且其高逼真度生成能力使得它们可以应用于VSR领域。为了合成更多丢失的细节并增强时间一致性,我们提出了一种新颖的空间退化感知和时间一致(SDATC)扩散模型,用于压缩视频的超分辨率处理。 具体来说,我们引入了一个失真控制模块(DCM),用以调节输入扩散模型的数据,并引导生成过程。接下来,扩散模型使用精细调优后的空间提示基压缩感知模块(PCAM)以及时空注意力模块(STAM)执行去噪和纹理生成任务。PCAM用于提取特征并动态编码特定的压缩信息。STAM则将空间注意机制扩展到时序维度以捕捉时间相关性。 在基准数据集上的大量实验结果表明,所提出的模块能够有效地提升压缩视频的质量。
https://arxiv.org/abs/2502.07381
Compressed video super-resolution (SR) aims to generate high-resolution (HR) videos from the corresponding low-resolution (LR) compressed videos. Recently, some compressed video SR methods attempt to exploit the spatio-temporal information in the frequency domain, showing great promise in super-resolution performance. However, these methods do not differentiate various frequency subbands spatially or capture the temporal frequency dynamics, potentially leading to suboptimal results. In this paper, we propose a deep frequency-based compressed video SR model (FCVSR) consisting of a motion-guided adaptive alignment (MGAA) network and a multi-frequency feature refinement (MFFR) module. Additionally, a frequency-aware contrastive loss is proposed for training FCVSR, in order to reconstruct finer spatial details. The proposed model has been evaluated on three public compressed video super-resolution datasets, with results demonstrating its effectiveness when compared to existing works in terms of super-resolution performance (up to a 0.14dB gain in PSNR over the second-best model) and complexity.
压缩视频超分辨率(SR)的目标是从相应的低分辨率(LR)压缩视频中生成高分辨率(HR)视频。最近,一些压缩视频SR方法试图利用频率域中的时空信息,在超分辨率性能方面展现出巨大潜力。然而,这些方法没有在空间上区分各种频带子频段或捕捉时间频率动态,这可能导致次优结果。在这篇论文中,我们提出了一种基于深度频率的压缩视频SR模型(FCVSR),该模型包括一个运动引导自适应对齐(MGAA)网络和一个多频带特征细化(MFFR)模块。此外,还提出了感知频率的对比损失以用于训练FCVSR,以便重建更精细的空间细节。所提出的模型已在三个公开的压缩视频超分辨率数据集上进行了评估,结果表明与现有方法相比,在超分辨率性能(PSNR比第二好模型高出0.14dB)和复杂度方面均表现出其有效性。
https://arxiv.org/abs/2502.06431
In the context of Omni-Directional Image (ODI) Super-Resolution (SR), the unique challenge arises from the non-uniform oversampling characteristics caused by EquiRectangular Projection (ERP). Considerable efforts in designing complex spherical convolutions or polyhedron reprojection offer significant performance improvements but at the expense of cumbersome processing procedures and slower inference speeds. Under these circumstances, this paper proposes a new ODI-SR model characterized by its capacity to perform Fast and Arbitrary-scale ODI-SR processes, denoted as FAOR. The key innovation lies in adapting the implicit image function from the planar image domain to the ERP image domain by incorporating spherical geometric priors at both the latent representation and image reconstruction stages, in a low-overhead manner. Specifically, at the latent representation stage, we adopt a pair of pixel-wise and semantic-wise sphere-to-planar distortion maps to perform affine transformations on the latent representation, thereby incorporating it with spherical properties. Moreover, during the image reconstruction stage, we introduce a geodesic-based resampling strategy, aligning the implicit image function with spherical geometrics without introducing additional parameters. As a result, the proposed FAOR outperforms the state-of-the-art ODI-SR models with a much faster inference speed. Extensive experimental results and ablation studies have demonstrated the effectiveness of our design.
在全向图像(ODI)超分辨率(SR)的背景下,一个独特的挑战源自于等矩形投影(ERP)引起的非均匀过采样特性。尽管通过设计复杂的球面卷积或多面体重投影来取得显著性能提升是可能的,但这些方法却以繁琐的处理流程和较慢的推理速度为代价。在这种情况下,本文提出了一种新的ODI-SR模型,该模型具有执行快速任意尺度全向图像超分辨率过程的能力,称为FAOR(Fast Arbitrary-scale ODI Super-Resolution)。关键创新在于通过在潜在表示阶段和图像重建阶段以低开销的方式融入球形几何先验,将平面图像域中的隐含图像函数调整到ERP图像域。具体来说,在潜在表示阶段,我们采用了一对像素级和语义级的球面到平面扭曲映射来执行对潜在表示的仿射变换,从而将其与球形属性相结合。此外,在图像重建阶段,我们引入了基于测地线的重采样策略,使隐含图像函数与球体几何一致而不增加额外参数。因此,所提出的FAOR模型在推理速度更快的情况下优于现有的最先进的ODI-SR模型。广泛的实验结果和消融研究证明了我们的设计的有效性。
https://arxiv.org/abs/2502.05902
Hyperspectral imaging (HSI) captures spatial and spectral data, enabling analysis of features invisible to conventional systems. The technology is vital in fields such as weather monitoring, food quality control, counterfeit detection, healthcare diagnostics, and extending into defense, agriculture, and industrial automation at the same time. HSI has advanced with improvements in spectral resolution, miniaturization, and computational methods. This study provides an overview of the HSI, its applications, challenges in data fusion and the role of deep learning models in processing HSI data. We discuss how integration of multimodal HSI with AI, particularly with deep learning, improves classification accuracy and operational efficiency. Deep learning enhances HSI analysis in areas like feature extraction, change detection, denoising unmixing, dimensionality reduction, landcover mapping, data augmentation, spectral construction and super resolution. An emerging focus is the fusion of hyperspectral cameras with large language models (LLMs), referred as highbrain LLMs, enabling the development of advanced applications such as low visibility crash detection and face antispoofing. We also highlight key players in HSI industry, its compound annual growth rate and the growing industrial significance. The purpose is to offer insight to both technical and non-technical audience, covering HSI's images, trends, and future directions, while providing valuable information on HSI datasets and software libraries.
高光谱成像(HSI)技术能够捕获空间和光谱数据,使得传统系统无法检测到的特征得以分析。这项技术在天气监测、食品质量控制、防伪识别、医疗诊断以及国防、农业和工业自动化等领域具有重要作用。随着光谱分辨率提升、设备小型化及计算方法的进步,高光谱成像技术得到了显著发展。 本研究概述了高光谱成像的技术特点及其应用领域,并讨论了数据融合所面临的挑战以及深度学习模型在处理高光谱图像数据中的作用。我们探讨了将多模态高光谱影像与人工智能(AI)结合使用,尤其是采用深度学习方法如何提高分类精度和操作效率。 深度学习提升了高光谱成像分析的多个领域,包括特征提取、变化检测、去噪、端元分离、降维、土地覆盖制图、数据增强、光谱重建及超分辨率。一个新兴的研究方向是将高光谱相机与大型语言模型(LLMs)融合,这些被称为“高级大脑”LLMs,能够开发出诸如低能见度碰撞检测和面部防欺骗等先进的应用。 此外,本研究还介绍了高光谱成像行业的关键参与者、复合年增长率以及工业重要性的增长。我们的目的是为技术背景各异的读者群体提供关于高光谱成像图像、趋势及未来发展方向的见解,并提供了有关高光谱数据集和软件库的重要信息。
https://arxiv.org/abs/2502.06894
Nuclear Magnetic Resonance (NMR) spectroscopy is a crucial analytical technique used for molecular structure elucidation, with applications spanning chemistry, biology, materials science, and medicine. However, the frequency resolution of NMR spectra is limited by the "field strength" of the instrument. High-field NMR instruments provide high-resolution spectra but are prohibitively expensive, whereas lower-field instruments offer more accessible, but lower-resolution, results. This paper introduces an AI-driven approach that not only enhances the frequency resolution of NMR spectra through super-resolution techniques but also provides multi-scale functionality. By leveraging a diffusion model, our method can reconstruct high-field spectra from low-field NMR data, offering flexibility in generating spectra at varying magnetic field strengths. These reconstructions are comparable to those obtained from high-field instruments, enabling finer spectral details and improving molecular characterization. To date, our approach is one of the first to overcome the limitations of instrument field strength, achieving NMR super-resolution through AI. This cost-effective solution makes high-resolution analysis accessible to more researchers and industries, without the need for multimillion-dollar equipment.
核磁共振(NMR)光谱法是一种用于解析分子结构的关键分析技术,其应用范围涵盖化学、生物学、材料科学和医学等领域。然而,NMR光谱的频率分辨率受限于仪器的“磁场强度”。高场强的NMR仪器能够提供高分辨率的光谱,但价格昂贵;而低场强仪器则提供了更易于获取的结果,但分辨率较低。本文介绍了一种由人工智能驱动的方法,该方法不仅通过超分辨率技术提高了NMR光谱的频率分辨率,并且还提供了多尺度功能。利用扩散模型,我们的方法可以从低磁场强度的NMR数据中重建出高磁场强度下的光谱,从而能够灵活生成不同磁场强度条件下的光谱。这些重建结果与使用高端仪器获取的结果相当,使得更精细的谱线细节和分子表征成为可能。 到目前为止,我们提出的方法是首个克服了仪器磁场限制的技术方案,通过人工智能实现了NMR超分辨率技术。这种成本效益高的解决方案使高分辨率分析对更多的研究者和行业变得可及,而无需购置数百万美元的设备。
https://arxiv.org/abs/2502.06845
Dataset distillation is the concept of condensing large datasets into smaller but highly representative synthetic samples. While previous research has primarily focused on image classification, its application to image Super-Resolution (SR) remains underexplored. This exploratory work studies multiple dataset distillation techniques applied to SR, including pixel- and latent-space approaches under different aspects. Our experiments demonstrate that a 91.12% dataset size reduction can be achieved while maintaining comparable SR performance to the full dataset. We further analyze initialization strategies and distillation methods to optimize memory efficiency and computational costs. Our findings provide new insights into dataset distillation for SR and set the stage for future advancements.
数据集蒸馏是指将大规模数据集压缩成较小但高度代表性的合成样本的概念。尽管之前的研究主要集中在图像分类上,但在图像超分辨率(SR)中的应用却鲜有探索。这项探究性研究探讨了多种应用于SR的数据集蒸馏技术,包括像素空间和潜在空间的方法,并从多个角度进行了分析。我们的实验表明,在保持与完整数据集相当的SR性能的同时,可以实现91.12%的数据集大小缩减。我们进一步分析了初始化策略和蒸馏方法,以优化内存效率和计算成本。我们的发现为SR中的数据集蒸馏提供了新的见解,并为进一步的发展奠定了基础。
https://arxiv.org/abs/2502.03656
Video super-resolution (VSR) aims to reconstruct a high-resolution (HR) video from a low-resolution (LR) counterpart. Achieving successful VSR requires producing realistic HR details and ensuring both spatial and temporal consistency. To restore realistic details, diffusion-based VSR approaches have recently been proposed. However, the inherent randomness of diffusion, combined with their tile-based approach, often leads to spatio-temporal inconsistencies. In this paper, we propose DC-VSR, a novel VSR approach to produce spatially and temporally consistent VSR results with realistic textures. To achieve spatial and temporal consistency, DC-VSR adopts a novel Spatial Attention Propagation (SAP) scheme and a Temporal Attention Propagation (TAP) scheme that propagate information across spatio-temporal tiles based on the self-attention mechanism. To enhance high-frequency details, we also introduce Detail-Suppression Self-Attention Guidance (DSSAG), a novel diffusion guidance scheme. Comprehensive experiments demonstrate that DC-VSR achieves spatially and temporally consistent, high-quality VSR results, outperforming previous approaches.
视频超分辨率(VSR)的目标是从低分辨率(LR)的视频重建出高分辨率(HR)的视频。实现成功的VSR需要生成逼真的HR细节,并确保空间和时间上的一致性。为了恢复现实中的细节,最近提出了一些基于扩散的方法来进行VSR。然而,由于扩散方法固有的随机性和它们采用的块状处理方式,这通常会导致空间和时间上的不一致性。 在本文中,我们提出了DC-VSR,这是一种新的VSR方法,旨在生成具有逼真纹理的空间上和时间上的一致性的视频超分辨率结果。为了实现空间和时间一致性,DC-VSR采用了基于自注意力机制的新颖空间注意传播(SAP)方案和时间注意传播(TAP)方案,在空间-时间块之间传递信息。此外,为增强高频细节,我们引入了抑制详情的自注意引导(DSSAG),这是一种新的扩散指导方案。 全面的实验表明,DC-VSR能够生成高质量的空间一致性和时间一致性视频超分辨率结果,并且超越了之前的方法。
https://arxiv.org/abs/2502.03502
Diffusion models (DMs) have significantly advanced the development of real-world image super-resolution (Real-ISR), but the computational cost of multi-step diffusion models limits their application. One-step diffusion models generate high-quality images in a one sampling step, greatly reducing computational overhead and inference latency. However, most existing one-step diffusion methods are constrained by the performance of the teacher model, where poor teacher performance results in image artifacts. To address this limitation, we propose FluxSR, a novel one-step diffusion Real-ISR technique based on flow matching models. We use the state-of-the-art diffusion model FLUX.1-dev as both the teacher model and the base model. First, we introduce Flow Trajectory Distillation (FTD) to distill a multi-step flow matching model into a one-step Real-ISR. Second, to improve image realism and address high-frequency artifact issues in generated images, we propose TV-LPIPS as a perceptual loss and introduce Attention Diversification Loss (ADL) as a regularization term to reduce token similarity in transformer, thereby eliminating high-frequency artifacts. Comprehensive experiments demonstrate that our method outperforms existing one-step diffusion-based Real-ISR methods. The code and model will be released at this https URL.
扩散模型(DMs)在现实世界的图像超分辨率(Real-ISR)领域取得了显著进展,但多步扩散模型的计算成本限制了它们的应用。一步扩散模型可以在一次采样步骤中生成高质量的图像,极大地减少了计算开销和推理延迟。然而,大多数现有的一步扩散方法受到教师模型性能的制约,其中较差的教师模型表现会导致图像出现伪影。为了解决这一局限性,我们提出了FluxSR,这是一种基于流匹配模型的一步扩散Real-ISR技术。我们使用最先进的扩散模型FLUX.1-dev作为教师模型和基础模型。首先,我们引入了流动轨迹蒸馏(FTD),将多步流匹配模型转化为一步Real-ISR。其次,为了提高图像的真实感并解决生成图像中的高频伪影问题,我们提出了TV-LPIPS作为一种感知损失,并引入了注意力多样化损失(ADL)作为正则化项,以减少变压器中令牌的相似性,从而消除高频伪影。全面的实验表明,我们的方法优于现有的基于一步扩散的Real-ISR方法。代码和模型将在以下链接发布:[此 URL](此 https URL)。
https://arxiv.org/abs/2502.01993
Transformer-based video super-resolution (VSR) models have set new benchmarks in recent years, but their substantial computational demands make most of them unsuitable for deployment on resource-constrained devices. Achieving a balance between model complexity and output quality remains a formidable challenge in VSR. Although lightweight models have been introduced to address this issue, they often struggle to deliver state-of-the-art performance. We propose a novel lightweight, parameter-efficient deep residual deformable convolution network for VSR. Unlike prior methods, our model enhances feature utilization through residual connections and employs deformable convolution for precise frame alignment, addressing motion dynamics effectively. Furthermore, we introduce a single memory tensor to capture information accrued from the past frames and improve motion estimation across frames. This design enables an efficient balance between computational cost and reconstruction quality. With just 2.3 million parameters, our model achieves state-of-the-art SSIM of 0.9175 on the REDS4 dataset, surpassing existing lightweight and many heavy models in both accuracy and resource efficiency. Architectural insights from our model pave the way for real-time VSR on streaming data.
基于Transformer的视频超分辨率(VSR)模型近年来取得了显著的成绩,但其巨大的计算需求使得大多数此类模型不适合部署在资源受限的设备上。如何在模型复杂性和输出质量之间找到平衡仍然是一个重大挑战。虽然已经引入了一些轻量级模型来解决这一问题,但它们往往难以达到最先进的性能水平。 我们提出了一种新颖且参数高效的深度残差可变形卷积网络用于VSR任务。与以往的方法不同,我们的模型通过使用残差连接提升了特征利用率,并采用了可变形卷积以实现精确的帧对齐,从而有效地应对运动动态问题。此外,我们引入了一个单一的记忆张量来捕捉过去帧积累的信息,并改善了跨帧的运动估计性能。这种设计使计算成本和重建质量之间实现了高效的平衡。 在REDs4数据集上,我们的模型仅使用230万参数就达到了最先进的SSIM值0.9175,不仅超过了现有的轻量级模型,甚至也超越了许多重型模型,在准确性和资源效率方面表现优异。我们模型的架构洞察为实现实时视频流上的VSR提供了新的途径。
https://arxiv.org/abs/2502.01816
Learning diffusion bridge models is easy; making them fast and practical is an art. Diffusion bridge models (DBMs) are a promising extension of diffusion models for applications in image-to-image translation. However, like many modern diffusion and flow models, DBMs suffer from the problem of slow inference. To address it, we propose a novel distillation technique based on the inverse bridge matching formulation and derive the tractable objective to solve it in practice. Unlike previously developed DBM distillation techniques, the proposed method can distill both conditional and unconditional types of DBMs, distill models in a one-step generator, and use only the corrupted images for training. We evaluate our approach for both conditional and unconditional types of bridge matching on a wide set of setups, including super-resolution, JPEG restoration, sketch-to-image, and other tasks, and show that our distillation technique allows us to accelerate the inference of DBMs from 4x to 100x and even provide better generation quality than used teacher model depending on particular setup.
学习扩散桥模型(Diffusion Bridge Models,DBMs)很简单;但要使它们变得快速且实用则是一门艺术。扩散桥模型是扩散模型在图像到图像转换应用中的一个有前景的扩展。然而,与许多现代的扩散和流模型一样,DBMs面临着推理速度慢的问题。为了解决这个问题,我们提出了一种基于逆向桥匹配公式的新颖蒸馏技术,并推导出实用的目标函数来解决这一问题。 不同于之前开发的DBM蒸馏技术,我们的方法可以同时对条件性和非条件性的DBMs进行蒸馏,在一步生成器中训练模型,并仅使用损坏的图像进行训练。我们在一系列广泛的设置上评估了我们这种方法在条件性和非条件性桥匹配上的表现,包括超分辨率、JPEG恢复、草图到图像转换以及其他任务,结果显示我们的蒸馏技术可以将DBM的推理速度提高4倍至100倍不等,并且在某些情况下甚至能提供比原教师模型更好的生成质量。
https://arxiv.org/abs/2502.01362
Neural network based Optimal Transport (OT) is a recent and fruitful direction in the generative modeling community. It finds its applications in various fields such as domain translation, image super-resolution, computational biology and others. Among the existing approaches to OT, of considerable interest are adversarial minimax solvers based on semi-dual formulations of OT problems. While promising, these methods lack theoretical investigation from a statistical learning perspective. Our work fills this gap by establishing upper bounds on the generalization error of an approximate OT map recovered by the minimax quadratic OT solver. Importantly, the bounds we derive depend solely on some standard statistical and mathematical properties of the considered functional classes (neural networks). While our analysis focuses on the quadratic OT, we believe that similar bounds could be derived for more general OT formulations, paving the promising direction for future research.
基于神经网络的最优传输(OT)是生成模型领域的一个新兴且富有成效的研究方向。它在域转换、图像超分辨率、计算生物学等领域有着广泛的应用。现有方法中,特别值得关注的是基于半对偶形式的最优传输问题的对抗性最小-最大求解器。尽管这些方法颇具前景,但从统计学习理论的角度来看,它们缺乏相应的研究。我们的工作填补了这一空白,通过建立由最小-最大二次OT求解器恢复的近似OT映射的一般化误差上界来证明这一点。重要的是,我们推导出的界限仅依赖于所考虑的功能类别的标准统计和数学属性(如神经网络)。虽然我们的分析侧重于二次OT问题,但我们相信类似的界限可以为更一般的最优传输形式推导出来,这为未来的研究开辟了一个充满希望的方向。
https://arxiv.org/abs/2502.01310
Due to the trade-off between the temporal and spatial resolution of thermal spaceborne sensors, super-resolution methods have been developed to provide fine-scale Land SurfaceTemperature (LST) maps. Most of them are trained at low resolution but applied at fine resolution, and so they require a scale-invariance hypothesis that is not always adapted. Themain contribution of this work is the introduction of a Scale-Invariance-Free approach for training Neural Network (NN) models, and the implementation of two NN models, calledScale-Invariance-Free Convolutional Neural Network for Super-Resolution (SIF-CNN-SR) for the super-resolution of MODIS LST products. The Scale-Invariance-Free approach consists ontraining the models in order to provide LST maps at high spatial resolution that recover the initial LST when they are degraded at low resolution and that contain fine-scale texturesinformed by the high resolution NDVI. The second contribution of this work is the release of a test database with ASTER LST images concomitant with MODIS ones that can be usedfor evaluation of super-resolution algorithms. We compare the two proposed models, SIF-CNN-SR1 and SIF-CNN-SR2, with four state-of-the-art methods, Bicubic, DMS, ATPRK, Tsharp,and a CNN sharing the same architecture as SIF-CNN-SR but trained under the scale-invariance hypothesis. We show that SIF-CNN-SR1 outperforms the state-of-the-art methods and the other two CNN models as evaluated with LPIPS and Fourier space metrics focusing on the analysis of textures. These results and the available ASTER-MODIS database for evaluation are promising for future studies on super-resolution of LST.
由于热空间传感器在时间和空间分辨率之间的权衡,已经开发了超分辨率方法来提供精细尺度的地表温度(LST)地图。大多数这些方法都是在低分辨率下训练的,但在高分辨率下应用,因此需要一个不总是适用的比例不变性假设。这项工作的主要贡献是提出了一种无需比例不变性的神经网络(NN)模型训练方法,并实现了两种名为SIF-CNN-SR的NN模型,用于改进MODIS LST产品的超分辨率处理。 无需比例不变性的方法在于通过在高空间分辨率下训练模型来提供LST地图,这些地图当降解为低分辨率时能够恢复初始LST,并且包含由高分辨率NDVI提供的精细纹理信息。这项工作的第二个贡献是发布了一个测试数据库,其中包含了与MODIS图像同时的ASTER LST图像,可用于超分辨率算法的评估。 我们比较了两种提出的模型SIF-CNN-SR1和SIF-CNN-SR2与其他四种最先进的方法(Bicubic、DMS、ATPRK、Tsharp)以及一个采用相同架构但基于比例不变性假设训练的CNN的表现。结果显示,SIF-CNN-SR1在使用LPIPS和傅里叶空间度量进行纹理分析时优于其他所有方法和两个卷积神经网络模型。 这些结果和可用于评价的ASTER-MODIS数据库为未来关于LST超分辨率的研究带来了希望。
https://arxiv.org/abs/2502.01204
Deep learning-based single-image super-resolution (SISR) technology focuses on enhancing low-resolution (LR) images into high-resolution (HR) ones. Although significant progress has been made, challenges remain in computational complexity and quality, particularly in remote sensing image processing. To address these issues, we propose our Omni-Scale RWKV Super-Resolution (OmniRWKVSR) model which presents a novel approach that combines the Receptance Weighted Key Value (RWKV) architecture with feature extraction techniques such as Visual RWKV Spatial Mixing (VRSM) and Visual RWKV Channel Mixing (VRCM), aiming to overcome the limitations of existing methods and achieve superior SISR performance. This work has proved able to provide effective solutions for high-quality image reconstruction. Under the 4x Super-Resolution tasks, compared to the MambaIR model, we achieved an average improvement of 0.26% in PSNR and 0.16% in SSIM.
基于深度学习的单图像超分辨率(SISR)技术专注于将低分辨率(LR)图像提升为高分辨率(HR)图像。尽管已经取得了显著的进步,但在计算复杂性和质量方面仍面临挑战,特别是在遥感图像处理领域。为了应对这些问题,我们提出了Omni-Scale RWKV Super-Resolution (OmniRWKVSR) 模型,该模型结合了Receptance Weighted Key Value (RWKV) 架构与特征提取技术(如视觉RWKV空间混合VRSM和视觉RWKV通道混合VRCM),旨在克服现有方法的局限性并实现卓越的SISR性能。我们的工作已证明能够为高质量图像重建提供有效的解决方案。在4倍超分辨率任务中,与MambaIR模型相比,我们分别实现了PSNR平均提高了0.26%,SSIM平均提高了0.16%的成绩。
https://arxiv.org/abs/2502.00404
While super-resolution (SR) methods based on diffusion models (DM) have demonstrated inspiring performance, their deployment is impeded due to the heavy request of memory and computation. Recent researchers apply two kinds of methods to compress or fasten the DM. One is to compress the DM into 1-bit, aka binarization, alleviating the storage and computation pressure. The other distills the multi-step DM into only one step, significantly speeding up inference process. Nonetheless, it remains impossible to deploy DM to resource-limited edge devices. To address this problem, we propose BiMaCoSR, which combines binarization and one-step distillation to obtain extreme compression and acceleration. To prevent the catastrophic collapse of the model caused by binarization, we proposed sparse matrix branch (SMB) and low rank matrixbranch (LRM). Both auxiliary branches pass the full-precision (FP) information but in different ways. SMB absorbs the extreme values and its output is high rank, carrying abundant FP information. Whereas, the design of LRMB is inspired by LoRA and is initialized with the top r SVD components, outputting low rank representation. The computation and storage overhead of our proposed branches can be safely ignored. Comprehensive comparison experiments are conducted to exhibit BiMaCoSR outperforms current state-of-the-art binarization methods and gains competitive performance compared with FP one-step model. BiMaCoSR achieves a 23.8x compression ratio and a 27.4x speedup ratio compared to FP counterpart. Our code and model are available at this https URL.
尽管基于扩散模型(DM)的超分辨率(SR)方法已经展示了令人振奋的表现,但由于对内存和计算资源的巨大需求,其部署受到了阻碍。近期的研究人员应用了两种方法来压缩或加速DM:一种是将DM压缩成1比特,即二值化,以减轻存储和计算的压力;另一种则是将多步骤的DM简化为一步骤,从而显著加快推理过程的速度。然而,这仍然使得在资源有限的边缘设备上部署扩散模型变得不可能。 为了应对这一问题,我们提出了BiMaCoSR,该方法结合了二值化和单步精简技术来获得极致的压缩与加速效果。为了防止由于二值化而导致的模型灾难性崩溃,我们引入了稀疏矩阵分支(SMB)和低秩矩阵分支(LRM)。这两个辅助分支以不同的方式传递全精度信息:SMB吸收极端值,并且其输出为高秩,携带丰富的全精度信息;而LRM的设计灵感来自LoRA,它由前r个奇异值分解(SVD)成分初始化,提供低秩表示。我们的分支计算和存储开销可以忽略不计。 为了展示BiMaCoSR优于当前最先进的二值化方法,并在性能上与全精度单步模型竞争,我们进行了全面的比较实验。相较于全精度版本,BiMaCoSR实现了23.8倍的压缩率及27.4倍的速度提升比。我们的代码和模型可在提供的链接中获取(原文中的具体链接无法直接复制,请根据原始文档查询)。
https://arxiv.org/abs/2502.00333
Image Super-Resolution (ISR) has seen significant progress with the introduction of remarkable generative models. However, challenges such as the trade-off issues between fidelity and realism, as well as computational complexity, have also posed limitations on their application. Building upon the tremendous success of autoregressive models in the language domain, we propose \textbf{VARSR}, a novel visual autoregressive modeling for ISR framework with the form of next-scale prediction. To effectively integrate and preserve semantic information in low-resolution images, we propose using prefix tokens to incorporate the condition. Scale-aligned Rotary Positional Encodings are introduced to capture spatial structures and the diffusion refiner is utilized for modeling quantization residual loss to achieve pixel-level fidelity. Image-based Classifier-free Guidance is proposed to guide the generation of more realistic images. Furthermore, we collect large-scale data and design a training process to obtain robust generative priors. Quantitative and qualitative results show that VARSR is capable of generating high-fidelity and high-realism images with more efficiency than diffusion-based methods. Our codes will be released at this https URL.
图像超分辨率(ISR)在引入了显著的生成模型后取得了重大进展,然而,在保真度与现实感之间的权衡问题以及计算复杂性等问题也对其应用带来了限制。基于自回归模型在语言领域的巨大成功,我们提出了\textbf{VARSR}——一种新的用于ISR的视觉自回归建模框架,采用下一级预测形式。为了有效地融合和保留低分辨率图像中的语义信息,我们提议使用前缀令牌来引入条件因素。尺度对齐的旋转位置编码被提出以捕捉空间结构,并利用扩散细化器来模拟量化残差损失,从而实现像素级别的保真度。基于图象的无分类引导机制被设计出来以指导生成更加现实的图像。此外,我们收集了大规模数据并设计了一套训练流程以获取稳健的生成先验知识。定量和定性结果表明,与基于扩散的方法相比,VARSR能够更高效地生成具有高保真度和高度真实感的图像。我们的代码将在上述网址发布。
https://arxiv.org/abs/2501.18993
Magnetic Resonance Imaging (MRI) offers critical insights into microstructural details, however, the spatial resolution of standard 1.5T imaging systems is often limited. In contrast, 7T MRI provides significantly enhanced spatial resolution, enabling finer visualization of anatomical structures. Though this, the high cost and limited availability of 7T MRI hinder its widespread use in clinical settings. To address this challenge, a novel Super-Resolution (SR) model is proposed to generate 7T-like MRI from standard 1.5T MRI scans. Our approach leverages a diffusion-based architecture, incorporating gradient nonlinearity correction and bias field correction data from 7T imaging as guidance. Moreover, to improve deployability, a progressive distillation strategy is introduced. Specifically, the student model refines the 7T SR task with steps, leveraging feature maps from the inference phase of the teacher model as guidance, aiming to allow the student model to achieve progressively 7T SR performance with a smaller, deployable model size. Experimental results demonstrate that our baseline teacher model achieves state-of-the-art SR performance. The student model, while lightweight, sacrifices minimal performance. Furthermore, the student model is capable of accepting MRI inputs at varying resolutions without the need for retraining, significantly further enhancing deployment flexibility. The clinical relevance of our proposed method is validated using clinical data from Massachusetts General Hospital. Our code is available at this https URL.
磁共振成像(MRI)提供了对微结构细节的重要洞察,然而,标准1.5特斯拉(T)成像系统的空间分辨率通常受到限制。相比之下,7特斯拉(T)的MRI提供显著增强的空间分辨率,能够更精细地可视化解剖结构。尽管如此,由于高成本和在临床环境中的有限可用性,7T MRI的应用范围受到了限制。为了解决这一挑战,提出了一种新的超分辨率(SR)模型,该模型可以从标准1.5T MRI扫描中生成类似7T的MRI图像。 我们的方法利用了基于扩散的架构,并结合了从7T成像获取的梯度非线性校正和偏置场校正数据作为指导。此外,为了提高部署能力,引入了一种渐进式知识蒸馏策略。具体而言,学生模型通过逐步细化7T超分辨率任务,利用教师模型推理阶段生成的特征图作为引导,旨在使学生模型能够使用更小、可部署的模型大小实现逐渐增强的7T超分辨率性能。 实验结果表明,我们的基准教师模型达到了最先进的超分辨率性能水平。相比之下,虽然学生模型轻量级,但在性能方面仅略有牺牲。此外,学生模型可以接受不同分辨率的MRI输入,并且无需重新训练即可进行处理,这极大地增强了部署灵活性。我们使用来自麻省总医院(Massachusetts General Hospital)的临床数据验证了所提出方法的临床相关性。 我们的代码可在以下链接获取:[请在此处插入具体网址]。
https://arxiv.org/abs/2501.18736
Mamba has demonstrated exceptional performance in visual tasks due to its powerful global modeling capabilities and linear computational complexity, offering considerable potential in hyperspectral image super-resolution (HSISR). However, in HSISR, Mamba faces challenges as transforming images into 1D sequences neglects the spatial-spectral structural relationships between locally adjacent pixels, and its performance is highly sensitive to input order, which affects the restoration of both spatial and spectral details. In this paper, we propose HSRMamba, a contextual spatial-spectral modeling state space model for HSISR, to address these issues both locally and globally. Specifically, a local spatial-spectral partitioning mechanism is designed to establish patch-wise causal relationships among adjacent pixels in 3D features, mitigating the local forgetting issue. Furthermore, a global spectral reordering strategy based on spectral similarity is employed to enhance the causal representation of similar pixels across both spatial and spectral dimensions. Finally, experimental results demonstrate our HSRMamba outperforms the state-of-the-art methods in quantitative quality and visual results. Code will be available soon.
曼巴(Mamba)在视觉任务中展示了卓越的性能,这得益于其强大的全局建模能力和线性计算复杂度。它在高光谱图像超分辨率(HSISR)方面具有巨大潜力。然而,在HSISR应用中,将图像转换为1D序列会导致忽略局部相邻像素之间的空间-光谱结构关系,并且曼巴的表现高度依赖于输入顺序,这影响了空间和光谱细节的恢复效果。 为此,我们提出了一种新的方法——HSRMamba(高光谱图像超分辨率中的上下文空间-光谱建模状态空间模型),用以解决这些问题。具体来说,该方法设计了一个局部空间-光谱分区机制,用于在3D特征中建立相邻像素之间的块级因果关系,从而减轻局部信息丢失的问题。此外,还引入了一种基于光谱相似性的全局光谱重排序策略,以增强跨空间和光谱维度的类似像素的因果表示。 最后,实验结果表明我们的HSRMamba方法在定量质量和视觉效果上都优于现有的先进技术。代码即将发布。
https://arxiv.org/abs/2501.18500
Deep learning has achieved significant success in single hyperspectral image super-resolution (SHSR); however, the high spectral dimensionality leads to a heavy computational burden, thus making it difficult to deploy in real-time scenarios. To address this issue, this paper proposes a novel lightweight SHSR network, i.e., LKCA-Net, that incorporates channel attention to calibrate multi-scale channel features of hyperspectral images. Furthermore, we demonstrate, for the first time, that the low-rank property of the learnable upsampling layer is a key bottleneck in lightweight SHSR methods. To address this, we employ the low-rank approximation strategy to optimize the parameter redundancy of the learnable upsampling layer. Additionally, we introduce a knowledge distillation-based feature alignment technique to ensure the low-rank approximated network retains the same feature representation capacity as the original. We conducted extensive experiments on the Chikusei, Houston 2018, and Pavia Center datasets compared to some SOTAs. The results demonstrate that our method is competitive in performance while achieving speedups of several dozen to even hundreds of times compared to other well-performing SHSR methods.
深度学习在单幅高光谱图像超分辨率(SHSR)领域取得了显著的成功;然而,由于高维的光谱特性带来了沉重的计算负担,使得其实时部署变得困难。为了解决这一问题,本文提出了一种新颖的轻量级 SHSR 网络 LKCA-Net,该网络结合了通道注意力机制来校准高光谱图像中的多尺度通道特征。此外,我们首次揭示可学习上采样层的低秩属性是轻量级 SHSR 方法的关键瓶颈。为了解决这一问题,我们采用了低秩近似策略来优化可学习上采样层的参数冗余。同时,我们引入了一种基于知识蒸馏的特征对齐技术,以确保经过低秩近似后的网络仍保持与原网络相同的特征表示能力。 我们在 Chikusei、Houston 2018 和 Pavia Center 数据集上进行了广泛的实验,并与其他一些最先进方法(SOTA)进行了比较。结果表明,我们的方法在性能方面具有竞争力的同时,在速度上比其他高性能的 SHSR 方法快了数十倍甚至数百倍。
https://arxiv.org/abs/2501.18664
Image restoration aims to recover details and enhance contrast in degraded images. With the growing demand for high-quality imaging (\textit{e.g.}, 4K and 8K), achieving a balance between restoration quality and computational efficiency has become increasingly critical. Existing methods, primarily based on CNNs, Transformers, or their hybrid approaches, apply uniform deep representation extraction across the image. However, these methods often struggle to effectively model long-range dependencies and largely overlook the spatial characteristics of image degradation (regions with richer textures tend to suffer more severe damage), making it hard to achieve the best trade-off between restoration quality and efficiency. To address these issues, we propose a novel texture-aware image restoration method, TAMambaIR, which simultaneously perceives image textures and achieves a trade-off between performance and efficiency. Specifically, we introduce a novel Texture-Aware State Space Model, which enhances texture awareness and improves efficiency by modulating the transition matrix of the state-space equation and focusing on regions with complex textures. Additionally, we design a {Multi-Directional Perception Block} to improve multi-directional receptive fields while maintaining low computational overhead. Extensive experiments on benchmarks for image super-resolution, deraining, and low-light image enhancement demonstrate that TAMambaIR achieves state-of-the-art performance with significantly improved efficiency, establishing it as a robust and efficient framework for image restoration.
图像恢复的目标是还原受损图像中的细节并增强对比度。随着对高质量成像(例如4K和8K)需求的增长,实现修复质量和计算效率之间的平衡变得越来越关键。现有方法主要基于卷积神经网络(CNNs)、变换器(Transformer),或两者的混合方法,在整个图像上应用统一的深度表示提取技术。然而,这些方法往往难以有效地建模长程依赖关系,并且严重忽视了图像降质的空间特性(纹理更丰富的区域通常遭受更严重的损坏),从而很难在修复质量和效率之间取得最佳平衡。 为了应对这些问题,我们提出了一种新型的基于纹理感知的图像恢复方法——TAMambaIR。该方法能够同时感知图像中的纹理特征并在性能与计算效率之间实现权衡。具体来说,我们引入了一个新颖的“基于纹理的状态空间模型”(Texture-Aware State Space Model),通过调节状态空间方程的转换矩阵来增强对纹理的认知,并专注于复杂纹理区域以提高效率。此外,我们设计了一种“多向感知模块”,旨在改善图像的多方向感受野,同时保持较低的计算开销。 在超分辨率、去雨和低光照图像增强等基准测试上的广泛实验表明,TAMambaIR能够实现最先进的性能,并且显著提高了效率,从而确立了其作为稳健而高效的图像恢复框架的地位。
https://arxiv.org/abs/2501.16583