This paper introduces a novel benchmark as part of the AIS 2024 Real-Time Image Super-Resolution (RTSR) Challenge, which aims to upscale compressed images from 540p to 4K resolution (4x factor) in real-time on commercial GPUs. For this, we use a diverse test set containing a variety of 4K images ranging from digital art to gaming and photography. The images are compressed using the modern AVIF codec, instead of JPEG. All the proposed methods improve PSNR fidelity over Lanczos interpolation, and process images under 10ms. Out of the 160 participants, 25 teams submitted their code and models. The solutions present novel designs tailored for memory-efficiency and runtime on edge devices. This survey describes the best solutions for real-time SR of compressed high-resolution images.
本文作为AIS 2024实时图像超分辨率(RTSR)挑战的一部分,旨在在商业GPU上实时将压缩图像从540p升级到4K分辨率(4x倍)。为此,我们使用包含各种4K图像的多样化测试集。这些图像使用现代AVIF编码器压缩,而不是JPEG。所有提出的方法都超过了Lanczos插值在PSNR方面的保真度,并能在10ms内处理图像。在160名参与者中,有25支团队提交了其代码和模型。本调查描述了用于实时压缩高分辨率图像的最佳解决方案。
https://arxiv.org/abs/2404.16484
The recent work Local Implicit Image Function (LIIF) and subsequent Implicit Neural Representation (INR) based works have achieved remarkable success in Arbitrary-Scale Super-Resolution (ASSR) by using MLP to decode Low-Resolution (LR) features. However, these continuous image representations typically implement decoding in High-Resolution (HR) High-Dimensional (HD) space, leading to a quadratic increase in computational cost and seriously hindering the practical applications of ASSR. To tackle this problem, we propose a novel Latent Modulated Function (LMF), which decouples the HR-HD decoding process into shared latent decoding in LR-HD space and independent rendering in HR Low-Dimensional (LD) space, thereby realizing the first computational optimal paradigm of continuous image representation. Specifically, LMF utilizes an HD MLP in latent space to generate latent modulations of each LR feature vector. This enables a modulated LD MLP in render space to quickly adapt to any input feature vector and perform rendering at arbitrary resolution. Furthermore, we leverage the positive correlation between modulation intensity and input image complexity to design a Controllable Multi-Scale Rendering (CMSR) algorithm, offering the flexibility to adjust the decoding efficiency based on the rendering precision. Extensive experiments demonstrate that converting existing INR-based ASSR methods to LMF can reduce the computational cost by up to 99.9%, accelerate inference by up to 57 times, and save up to 76% of parameters, while maintaining competitive performance. The code is available at this https URL.
最近基于MLP的局部隐式图像函数(LIIF)和后续的隐式神经表示(INR)在任意尺度超分辨率(ASSR)方面的研究取得了显著的成功,通过使用MLP解码低分辨率(LR)特征。然而,这些连续的图像表示通常在高分辨率(HR)和高维度(HD)空间中执行解码,导致计算成本增加,严重阻碍了ASSR的实际应用。为了解决这个问题,我们提出了一个新颖的潜在模块函数(LMF),它将高分辨率(HR)和高维度(HD)解码过程在LR-HD空间中进行共享隐式解码,在HR低维度(LD)空间中实现独立渲染,从而实现了连续图像表示的第一个计算最优范式。具体来说,LMF利用高维度(HD)的MLP在隐空间中生成每个LR特征向量的隐式模度。这使得在渲染空间中,模度强度与输入特征向量呈正相关,从而实现对输入图像复杂度的自适应调整。此外,我们利用模度强度与输入图像复杂度之间的正相关性,设计了一个可控制多尺度渲染(CMSR)算法,根据渲染精度调整解码效率。大量实验证明,将现有的INR为基础的ASSR方法转换为LMF可以降低计算成本至99.9%,加速推理至57倍,并节省约76%的参数,同时保持竞争力的性能。代码可在此处访问:https://url.com/
https://arxiv.org/abs/2404.16451
Satellite imaging generally presents a trade-off between the frequency of acquisitions and the spatial resolution of the images. Super-resolution is often advanced as a way to get the best of both worlds. In this work, we investigate multi-image super-resolution of satellite image time series, i.e. how multiple images of the same area acquired at different dates can help reconstruct a higher resolution observation. In particular, we extend state-of-the-art deep single and multi-image super-resolution algorithms, such as SRDiff and HighRes-net, to deal with irregularly sampled Sentinel-2 time series. We introduce BreizhSR, a new dataset for 4x super-resolution of Sentinel-2 time series using very high-resolution SPOT-6 imagery of Brittany, a French region. We show that using multiple images significantly improves super-resolution performance, and that a well-designed temporal positional encoding allows us to perform super-resolution for different times of the series. In addition, we observe a trade-off between spectral fidelity and perceptual quality of the reconstructed HR images, questioning future directions for super-resolution of Earth Observation data.
卫星成像通常在采集频率和图像空间分辨率之间存在权衡。超分辨率通常通过实现两者之间的平衡来解决这一问题。在这项工作中,我们研究了卫星图像时间序列的多图像超分辨率,即不同日期获取的同一区域的多张图像如何帮助重建更高分辨率的观测。 特别是,我们将先进的深度单图像和多图像超分辨率算法,如SRDiff和HighRes-net,扩展到处理采样不规则的Sentinel-2时间序列。我们引入了BreizhSR,一个使用非常高分辨率的神话2图像的4x超分辨率数据集,该数据集来自法国的一个地区。我们证明了使用多张图像 significantly 改善了超分辨率性能,并且通过设计良好的时间序列的定位编码,我们能够为不同时间序列进行超分辨率。 此外,我们观察到复像分辨率和图像质量之间存在权衡,这引发了关于地球观测数据超分辨率未来方向的思考。
https://arxiv.org/abs/2404.16409
This paper reviews the NTIRE 2024 RAW Image Super-Resolution Challenge, highlighting the proposed solutions and results. New methods for RAW Super-Resolution could be essential in modern Image Signal Processing (ISP) pipelines, however, this problem is not as explored as in the RGB domain. Th goal of this challenge is to upscale RAW Bayer images by 2x, considering unknown degradations such as noise and blur. In the challenge, a total of 230 participants registered, and 45 submitted results during thee challenge period. The performance of the top-5 submissions is reviewed and provided here as a gauge for the current state-of-the-art in RAW Image Super-Resolution.
本文回顾了NTIRE 2024 RAW图像超分辨率挑战,重点关注所提出的解决方案和结果。在现代图像信号处理(ISP)流程中,RAW超分辨率的新方法可能至关重要,然而,与RGB领域相比,这个问题并没有被广泛探讨。挑战的目标是将RAW Bayer图像的分辨率提高2倍,考虑到未知的降噪和模糊等损失。在挑战期间,共有230名参与者注册,45名提交了结果。对挑战前五名提交者的性能进行了审查,并提供了一个用于评估RAW图像超分辨率当前状态的指标。
https://arxiv.org/abs/2404.16223
State space models (SSMs) with selection mechanisms and hardware-aware architectures, namely Mamba, have recently demonstrated significant promise in long-sequence modeling. Since the self-attention mechanism in transformers has quadratic complexity with image size and increasing computational demands, the researchers are now exploring how to adapt Mamba for computer vision tasks. This paper is the first comprehensive survey aiming to provide an in-depth analysis of Mamba models in the field of computer vision. It begins by exploring the foundational concepts contributing to Mamba's success, including the state space model framework, selection mechanisms, and hardware-aware design. Next, we review these vision mamba models by categorizing them into foundational ones and enhancing them with techniques such as convolution, recurrence, and attention to improve their sophistication. We further delve into the widespread applications of Mamba in vision tasks, which include their use as a backbone in various levels of vision processing. This encompasses general visual tasks, Medical visual tasks (e.g., 2D / 3D segmentation, classification, and image registration, etc.), and Remote Sensing visual tasks. We specially introduce general visual tasks from two levels: High/Mid-level vision (e.g., Object detection, Segmentation, Video classification, etc.) and Low-level vision (e.g., Image super-resolution, Image restoration, Visual generation, etc.). We hope this endeavor will spark additional interest within the community to address current challenges and further apply Mamba models in computer vision.
带有选择机制和硬件感知架构的状态空间模型(SSMs),如Mamba,在长序列建模方面最近取得了显著的进展。由于Transformer中自注意力机制的复杂性随着图像尺寸的增加而增加,计算机视觉任务的计算需求也在增加,因此研究人员现在正在探索如何将Mamba适应计算机视觉任务。本文是旨在为计算机视觉领域提供对Mamba模型的深入分析的第一篇全面调查。文章首先探讨了导致Mamba成功的基本概念,包括状态空间模型框架、选择机制和硬件感知设计。接下来,我们通过分类这些视觉Mamba模型为基本模型并使用卷积、递归和注意等技术对其进行改进,来回顾这些模型。我们深入探讨了Mamba在计算机视觉任务中的广泛应用,包括在各种级别视觉处理中的作为骨干的应用。这包括一般视觉任务(如物体检测、分割、分类和图像配准等)、医学视觉任务(如2D/3D分割、分类和图像配准等)和遥感视觉任务。我们特别引入了两个层面的通用视觉任务:高/中级别视觉(如物体检测、分割、视频分类等)和低级别视觉(如图像超分辨率、图像修复、视觉生成等)。我们希望这个努力将在社区中激发更多的兴趣,以解决当前的挑战并进一步将Mamba模型应用于计算机视觉。
https://arxiv.org/abs/2404.15956
Thermal imaging plays a crucial role in various applications, but the inherent low resolution of commonly available infrared (IR) cameras limits its effectiveness. Conventional super-resolution (SR) methods often struggle with thermal images due to their lack of high-frequency details. Guided SR leverages information from a high-resolution image, typically in the visible spectrum, to enhance the reconstruction of a high-res IR image from the low-res input. Inspired by SwinFusion, we propose SwinFuSR, a guided SR architecture based on Swin transformers. In real world scenarios, however, the guiding modality (e.g. RBG image) may be missing, so we propose a training method that improves the robustness of the model in this case. Our method has few parameters and outperforms state of the art models in terms of Peak Signal to Noise Ratio (PSNR) and Structural SIMilarity (SSIM). In Track 2 of the PBVS 2024 Thermal Image Super-Resolution Challenge, it achieves 3rd place in the PSNR metric. Our code and pretained weights are available at this https URL.
热成像在各种应用中扮演着关键角色,但通常可用的红外(IR)相机固有的低分辨率限制了其效果。传统的超分辨率(SR)方法往往由于其缺乏高频细节,在热图像上表现不佳。引导SR利用高分辨率图像上的信息,通常在可见光谱范围内,增强低分辨率输入的热红外图像的重建。受到SwinFusion的启发,我们提出了SwinFuSR,一种基于Swin变换器的引导SR架构。然而,在现实世界的场景中,引导模式(例如RGB图像)可能缺失,因此我们提出了一种改进模型的方法,以提高其在这种情况下的一致性。我们的方法具有很少的参数,并且在PSNR和结构相似性(SSIM)方面优于最先进的模型。在2024年PBVS thermal image super-resolution challenge的跟踪2中,它在PSNR指标上获得了第3名。我们的代码和预训练权重可在此https:// URL上找到。
https://arxiv.org/abs/2404.14533
SEGSRNet addresses the challenge of precisely identifying surgical instruments in low-resolution stereo endoscopic images, a common issue in medical imaging and robotic surgery. Our innovative framework enhances image clarity and segmentation accuracy by applying state-of-the-art super-resolution techniques before segmentation. This ensures higher-quality inputs for more precise segmentation. SEGSRNet combines advanced feature extraction and attention mechanisms with spatial processing to sharpen image details, which is significant for accurate tool identification in medical images. Our proposed model outperforms current models including Dice, IoU, PSNR, and SSIM, SEGSRNet where it produces clearer and more accurate images for stereo endoscopic surgical imaging. SEGSRNet can provide image resolution and precise segmentation which can significantly enhance surgical accuracy and patient care outcomes.
SEGSRNet 解决了在低分辨率立体视频内镜图像中精确识别手术器械的挑战,这是医疗影像和机器人手术中常见的问题。我们创新的方法通过在分割之前应用最先进的超分辨率技术来提高图像清晰度和分割准确性,从而确保为更精确的分割提供更高质量的输入。SEGSRNet 结合先进的特征提取和关注机制与空间处理来增强图像细节,这对准确工具识别在医学图像中非常重要。与当前模型包括 Dice、IoU、PSNR 和 SSIM 相比,我们的提出的模型在立体内镜手术视频中的图像清晰度和准确性方面表现出色。SEGSRNet 可以在图像分辨率和解剖细节上提供更高的准确性和更高质量的分割,从而显著提高手术准确性和患者护理结果。
https://arxiv.org/abs/2404.13330
The recent advancement of spatial transcriptomics (ST) allows to characterize spatial gene expression within tissue for discovery research. However, current ST platforms suffer from low resolution, hindering in-depth understanding of spatial gene expression. Super-resolution approaches promise to enhance ST maps by integrating histology images with gene expressions of profiled tissue spots. However, current super-resolution methods are limited by restoration uncertainty and mode collapse. Although diffusion models have shown promise in capturing complex interactions between multi-modal conditions, it remains a challenge to integrate histology images and gene expression for super-resolved ST maps. This paper proposes a cross-modal conditional diffusion model for super-resolving ST maps with the guidance of histology images. Specifically, we design a multi-modal disentangling network with cross-modal adaptive modulation to utilize complementary information from histology images and spatial gene expression. Moreover, we propose a dynamic cross-attention modelling strategy to extract hierarchical cell-to-tissue information from histology images. Lastly, we propose a co-expression-based gene-correlation graph network to model the co-expression relationship of multiple genes. Experiments show that our method outperforms other state-of-the-art methods in ST super-resolution on three public datasets.
近年来,空间转录组学(ST)的进展允许在组织中对空间基因表达进行研究,从而推动发现研究。然而,当前的ST平台存在分辨率低和信噪比低的问题,阻碍了深入了解空间基因表达。超分辨率方法通过将组织图像与特征组织的基因表达集成来增强ST映射。然而,现有的超分辨率方法受到修复不确定性和模态崩塌的限制。尽管扩散模型在捕捉多模态条件下的复杂相互作用方面显示出前景,但将组织图像和基因表达集成到超分辨率ST映射仍然具有挑战性。本文提出了一种跨模态条件扩散模型,在历史图像的指导下解决ST映射的超分辨率问题。具体来说,我们设计了一个多模态去中心化网络,通过跨模态自适应调制利用组织图像的互补信息。此外,我们还提出了一种动态跨注意建模策略,从组织图像中提取层次结构细胞到组织信息。最后,我们提出了一种基于共表达的基因相关图网络来建模多个基因之间的共表达关系。实验结果表明,我们的方法在三个公开数据集上的ST超分辨率方面超过了最先进的水平。
https://arxiv.org/abs/2404.12973
Fluorescence lifetime imaging microscopy (FLIM) provides detailed information about molecular interactions and biological processes. A major bottleneck for FLIM is image resolution at high acquisition speeds, due to the engineering and signal-processing limitations of time-resolved imaging technology. Here we present single-sample image-fusion upsampling (SiSIFUS), a data-fusion approach to computational FLIM super-resolution that combines measurements from a low-resolution time-resolved detector (that measures photon arrival time) and a high-resolution camera (that measures intensity only). To solve this otherwise ill-posed inverse retrieval problem, we introduce statistically informed priors that encode local and global dependencies between the two single-sample measurements. This bypasses the risk of out-of-distribution hallucination as in traditional data-driven approaches and delivers enhanced images compared for example to standard bilinear interpolation. The general approach laid out by SiSIFUS can be applied to other image super-resolution problems where two different datasets are available.
荧光寿命成像显微镜(FLIM)可以提供关于分子相互作用和生物过程的详细信息。FLIM的主要瓶颈是高速采集时的图像分辨率,由于时间分辨率成像技术的工程和信号处理限制,导致高采样率下图像分辨率较低。在这里,我们介绍了单样本图像融合升级(SiSIFUS),一种用于计算FLIM超分辨率的数据融合方法,它结合了来自低分辨率时间分辨率检测器(测量光子到达时间)的测量结果和高分辨率相机(仅测量强度)的测量结果。为解决这个否则具有挑战性的反问题,我们引入了具有统计学信息的先验,它们编码了两个单样本测量之间局部和全局依赖关系。这跳过了传统数据驱动方法中可能出现离群混杂现象的风险,并为例如标准双线性插值等获得的图像提供了更出色的对比度。SiSIFUS提出的方法可以应用于其他可用数据集的图像超分辨率问题。
https://arxiv.org/abs/2404.13102
Video super-resolution (VSR) approaches have shown impressive temporal consistency in upsampled videos. However, these approaches tend to generate blurrier results than their image counterparts as they are limited in their generative capability. This raises a fundamental question: can we extend the success of a generative image upsampler to the VSR task while preserving the temporal consistency? We introduce VideoGigaGAN, a new generative VSR model that can produce videos with high-frequency details and temporal consistency. VideoGigaGAN builds upon a large-scale image upsampler -- GigaGAN. Simply inflating GigaGAN to a video model by adding temporal modules produces severe temporal flickering. We identify several key issues and propose techniques that significantly improve the temporal consistency of upsampled videos. Our experiments show that, unlike previous VSR methods, VideoGigaGAN generates temporally consistent videos with more fine-grained appearance details. We validate the effectiveness of VideoGigaGAN by comparing it with state-of-the-art VSR models on public datasets and showcasing video results with $8\times$ super-resolution.
视频超分辨率(VSR)方法在提高视频的分辨率方面表现出出色的时间一致性。然而,这些方法往往生成比其图像对应品更模糊的结果,因为它们的生成能力有限。这引发了一个基本问题:我们能否在保持时间一致性的同时,将生成图像的超分辨率方法扩展到VSR任务中?我们引入了VideoGigaGAN,一种新的具有高频视频细节和时间一致性的生成VSR模型。VideoGigaGAN基于一个大型的图像超分辨率模型——GigaGAN。通过添加时间模块将GigaGAN膨胀为视频模型,会导致严重的 temporal flickering。我们列举了几个关键问题,并提出了明显提高超分辨率视频的时间一致性的技术。我们的实验证明,与以前的VSR方法不同,VideoGigaGAN生成了具有更多细节的高频视频。我们通过在公开数据集上与最先进的VSR模型进行比较,并展示了具有8倍超分辨率的视频结果,验证了VideoGigaGAN的有效性。
https://arxiv.org/abs/2404.12388
Recently, in the super-resolution (SR) domain, transformers have outperformed CNNs with fewer FLOPs and fewer parameters since they can deal with long-range dependency and adaptively adjust weights based on instance. In this paper, we demonstrate that CNNs, although less focused on in the current SR domain, surpass Transformers in direct efficiency measures. By incorporating the advantages of Transformers into CNNs, we aim to achieve both computational efficiency and enhanced performance. However, using a large kernel in the SR domain, which mainly processes large images, incurs a large computational overhead. To overcome this, we propose novel approaches to employing the large kernel, which can reduce latency by 86\% compared to the naive large kernel, and leverage an Element-wise Attention module to imitate instance-dependent weights. As a result, we introduce Partial Large Kernel CNNs for Efficient Super-Resolution (PLKSR), which achieves state-of-the-art performance on four datasets at a scale of $\times$4, with reductions of 68.1\% in latency and 80.2\% in maximum GPU memory occupancy compared to SRFormer-light.
近年来,在超分辨率(SR)领域,Transformer 已经超越了 CNN,因为它们可以处理长距离依赖关系并根据实例自适应调整权重。在本文中,我们证明了 CNN 在当前 SR 领域虽然不如 Transformer 聚焦,但在直接效率测量方面超过了 Transformer。通过将 Transformer 的优势融入 CNN 中,我们旨在实现计算效率和增强性能的平衡。然而,在 SR 领域使用大核,主要用于处理大图像,会带来较大的计算开销。为了克服这一问题,我们提出了使用大核的新方法,与 naive 大核相比,可以降低延迟 86\%,并利用 Element-wise Attention 模块模仿实例相关的权重。因此,我们引入了 efficient Super-Resolution (PLKSR) 部分大核卷积神经网络,在 scale*4 的数据集上实现了最先进的性能,与 SRFormer-light 相比,延迟减少了 68.1\%,最大 GPU 内存占用减少了 80.2\%。
https://arxiv.org/abs/2404.11848
Transformer-based models have achieved remarkable results in low-level vision tasks including image super-resolution (SR). However, early Transformer-based approaches that rely on self-attention within non-overlapping windows encounter challenges in acquiring global information. To activate more input pixels globally, hybrid attention models have been proposed. Moreover, training by solely minimizing pixel-wise RGB losses, such as L1, have been found inadequate for capturing essential high-frequency details. This paper presents two contributions: i) We introduce convolutional non-local sparse attention (NLSA) blocks to extend the hybrid transformer architecture in order to further enhance its receptive field. ii) We employ wavelet losses to train Transformer models to improve quantitative and subjective performance. While wavelet losses have been explored previously, showing their power in training Transformer-based SR models is novel. Our experimental results demonstrate that the proposed model provides state-of-the-art PSNR results as well as superior visual performance across various benchmark datasets.
基于Transformer的模型在低级视觉任务中已经取得了显著的成果,包括图像超分辨率(SR)。然而,早期基于自注意力的Transformer方法在获取全局信息时遇到了挑战。为了全局激活更多的输入像素,混合注意模型已经被提出。此外,仅仅通过最小化像素级的RGB损失(如L1)进行训练,如之前的尝试,发现不足以捕捉基本的高频细节。本文提出了两个贡献:i)我们引入了卷积非局部稀疏注意(NLSA)模块,以扩展混合Transformer架构,从而进一步增强其接收域。ii)我们使用了谱聚类损失来训练Transformer模型,以提高数量和主观性能。尽管谱聚类损失之前已经被探索过,但它们在训练Transformer-based SR模型方面的表现是新颖的。我们的实验结果表明,与最先进的PSNR结果相比,所提出的模型在各种基准数据集上的视觉表现都更出色。
https://arxiv.org/abs/2404.11273
Image super-resolution is a fundamentally ill-posed problem because multiple valid high-resolution images exist for one low-resolution image. Super-resolution methods based on diffusion probabilistic models can deal with the ill-posed nature by learning the distribution of high-resolution images conditioned on low-resolution images, avoiding the problem of blurry images in PSNR-oriented methods. However, existing diffusion-based super-resolution methods have high time consumption with the use of iterative sampling, while the quality and consistency of generated images are less than ideal due to problems like color shifting. In this paper, we propose Efficient Conditional Diffusion Model with Probability Flow Sampling (ECDP) for image super-resolution. To reduce the time consumption, we design a continuous-time conditional diffusion model for image super-resolution, which enables the use of probability flow sampling for efficient generation. Additionally, to improve the consistency of generated images, we propose a hybrid parametrization for the denoiser network, which interpolates between the data-predicting parametrization and the noise-predicting parametrization for different noise scales. Moreover, we design an image quality loss as a complement to the score matching loss of diffusion models, further improving the consistency and quality of super-resolution. Extensive experiments on DIV2K, ImageNet, and CelebA demonstrate that our method achieves higher super-resolution quality than existing diffusion-based image super-resolution methods while having lower time consumption. Our code is available at this https URL.
图像超分辨率是一个基本不满足问题的问题,因为针对一个低分辨率图像存在多个高分辨率图像。基于扩散概率模型的超分辨率方法通过学习基于低分辨率图像的高分辨率图像的概率分布来解决不满足问题,避免了PSNR导向方法中的模糊图像问题。然而,现有的基于扩散的超分辨率方法在迭代采样过程中具有高时间消耗,生成的图像的质量和不一致性不如理想,因为存在诸如颜色偏移等问题。在本文中,我们提出了用于图像超分辨率的有条件扩散模型概率流采样(ECDP)。为了减少时间消耗,我们设计了一个连续时间条件扩散模型,使得概率流采样能够用于高效的图像生成。此外,为了提高生成的图像的一致性,我们提出了一个混合参数化方法,该方法在数据预测参数化和噪声预测参数化之间进行平滑。此外,我们还设计了一个图像质量损失作为扩散模型分数匹配损失的补充,进一步提高了超分辨率的一致性和质量。在DIV2K、ImageNet和CelebA等数据集上进行的大量实验证明,我们的方法在超分辨率质量上优于现有的扩散基图像超分辨率方法,同时具有较低的时间消耗。我们的代码可在此处访问:https://www.kazuhiko.me/ECDP-SUPER-RESOLUTION
https://arxiv.org/abs/2404.10688
This paper provides a comprehensive review of the NTIRE 2024 challenge, focusing on efficient single-image super-resolution (ESR) solutions and their outcomes. The task of this challenge is to super-resolve an input image with a magnification factor of x4 based on pairs of low and corresponding high-resolution images. The primary objective is to develop networks that optimize various aspects such as runtime, parameters, and FLOPs, while still maintaining a peak signal-to-noise ratio (PSNR) of approximately 26.90 dB on the DIV2K_LSDIR_valid dataset and 26.99 dB on the DIV2K_LSDIR_test dataset. In addition, this challenge has 4 tracks including the main track (overall performance), sub-track 1 (runtime), sub-track 2 (FLOPs), and sub-track 3 (parameters). In the main track, all three metrics (ie runtime, FLOPs, and parameter count) were considered. The ranking of the main track is calculated based on a weighted sum-up of the scores of all other sub-tracks. In sub-track 1, the practical runtime performance of the submissions was evaluated, and the corresponding score was used to determine the ranking. In sub-track 2, the number of FLOPs was considered. The score calculated based on the corresponding FLOPs was used to determine the ranking. In sub-track 3, the number of parameters was considered. The score calculated based on the corresponding parameters was used to determine the ranking. RLFN is set as the baseline for efficiency measurement. The challenge had 262 registered participants, and 34 teams made valid submissions. They gauge the state-of-the-art in efficient single-image super-resolution. To facilitate the reproducibility of the challenge and enable other researchers to build upon these findings, the code and the pre-trained model of validated solutions are made publicly available at this https URL.
本论文对NTIRE 2024挑战进行了全面的回顾,重点关注高效的单图像超分辨率(ESR)解决方案及其效果。挑战的任务是根据低分辨率和高分辨率图像的成对,将输入图像进行4倍放大的超分辨率。主要目标是在保持DIV2K_LSDIR_valid数据集上的峰值信号-噪声比(PSNR)约为26.90 dB和DIV2K_LSDIR_test数据集上的峰值信号-噪声比(PSNR)约为26.99 dB的同时,开发网络优化各种方面,如运行时间、参数和FLOPs。此外,该挑战分为4个轨道,包括主轨道(总体性能)、子轨道1(运行时间)、子轨道2(FLOPs)和子轨道3(参数)。在主轨道中,考虑了所有三个指标(即运行时间、FLOPs和参数计数)。主轨道的排名基于所有其他子轨道评分之加权求和。在子轨道1中,对提交的实现进行了实际运行时间的评估,并使用相应的得分来确定排名。在子轨道2中,考虑了FLOPs的数量。基于相应的FLOPs计算的得分用于确定排名。在子轨道3中,考虑了参数的数量。基于相应的参数计算的得分用于确定排名。RLFN被设定为效率测量的基准。挑战有262名注册参与者,34支队伍提交了有效的解决方案。它们衡量了ESR的现有水平。为了促进挑战的重复性,并使其他研究人员能够基于这些发现进行构建,代码和预训练模型的知识产权已公开在https://这个URL上。
https://arxiv.org/abs/2404.10343
Recently, 3D Gaussian Splatting (3DGS) has gained popularity as a novel explicit 3D representation. This approach relies on the representation power of Gaussian primitives to provide a high-quality rendering. However, primitives optimized at low resolution inevitably exhibit sparsity and texture deficiency, posing a challenge for achieving high-resolution novel view synthesis (HRNVS). To address this problem, we propose Super-Resolution 3D Gaussian Splatting (SRGS) to perform the optimization in a high-resolution (HR) space. The sub-pixel constraint is introduced for the increased viewpoints in HR space, exploiting the sub-pixel cross-view information of the multiple low-resolution (LR) views. The gradient accumulated from more viewpoints will facilitate the densification of primitives. Furthermore, a pre-trained 2D super-resolution model is integrated with the sub-pixel constraint, enabling these dense primitives to learn faithful texture features. In general, our method focuses on densification and texture learning to effectively enhance the representation ability of primitives. Experimentally, our method achieves high rendering quality on HRNVS only with LR inputs, outperforming state-of-the-art methods on challenging datasets such as Mip-NeRF 360 and Tanks & Temples. Related codes will be released upon acceptance.
近年来,3D Gaussian Splatting(3DGS)作为一种新颖的显式3D表示方法,已经赢得了人们的关注。这种方法依赖于高斯原色的表示能力来提供高质量的渲染。然而,在低分辨率下优化的基本体素往往表现出稀疏性和纹理不足,这使得实现高分辨率的新颖视图合成(HRNVS)具有挑战性。为解决这个问题,我们提出了超级分辨率3D Gaussian Splatting(SRGS)来在高分辨率(HR)空间中进行优化。引入了亚像素约束来增加HR空间中的视点,利用多个低分辨率(LR)视图的亚像素跨视信息。从更多视点累积的梯度将促进基本体素的密度。此外,将预训练的2D超分辨率模型与亚像素约束集成,使这些密集的基本体素能够学习到真实的纹理特征。总的来说,我们的方法专注于提高基本体素的密度和纹理学习,从而有效增强其表示能力。在实验中,我们的方法在HRNVS仅使用LR输入时,实现了高渲染质量,并在具有挑战性的数据集(如Mip-NeRF 360和Tanks & Temples)上超过了最先进的方法。相关代码将在接受提交时发布。
https://arxiv.org/abs/2404.10318
Omnidirectional images (ODIs) are commonly used in real-world visual tasks, and high-resolution ODIs help improve the performance of related visual tasks. Most existing super-resolution methods for ODIs use end-to-end learning strategies, resulting in inferior realness of generated images and a lack of effective out-of-domain generalization capabilities in training methods. Image generation methods represented by diffusion model provide strong priors for visual tasks and have been proven to be effectively applied to image restoration tasks. Leveraging the image priors of the Stable Diffusion (SD) model, we achieve omnidirectional image super-resolution with both fidelity and realness, dubbed as OmniSSR. Firstly, we transform the equirectangular projection (ERP) images into tangent projection (TP) images, whose distribution approximates the planar image domain. Then, we use SD to iteratively sample initial high-resolution results. At each denoising iteration, we further correct and update the initial results using the proposed Octadecaplex Tangent Information Interaction (OTII) and Gradient Decomposition (GD) technique to ensure better consistency. Finally, the TP images are transformed back to obtain the final high-resolution results. Our method is zero-shot, requiring no training or fine-tuning. Experiments of our method on two benchmark datasets demonstrate the effectiveness of our proposed method.
定向图像(ODIs)通常用于现实世界的视觉任务,而高分辨率ODIs有助于提高相关视觉任务的性能。大多数现有的超分辨率方法ODIs都使用端到端学习策略,导致生成的图像的现实性较差,训练方法中缺乏有效的跨域通用能力。代表扩散模型的图像生成方法具有很强的对视觉任务的优先级,已经被证明有效地应用于图像修复任务。通过利用Stable Diffusion(SD)模型的图像先验,我们实现了一种既有保真度又有真实感的 omnidirectional 图像超分辨率,被称为OmniSSR。首先,我们将等角投影(ERP)图像转换为切线投影(TP)图像,其分布近似于平面图像域。然后,我们使用SD逐迭代采样初始高分辨率结果。在每一次去噪迭代中,我们进一步使用所提出的八面体切线信息交互(OTII)和梯度分解(GD)技术纠正和更新初始结果,确保更好的一致性。最后,TP图像转换为获得最终高分辨率结果。我们的方法是零散的,不需要训练或微调。在两个基准数据集上的实验表明,我们提出的方法的有效性。
https://arxiv.org/abs/2404.10312
This paper reviews the NTIRE 2024 challenge on image super-resolution ($\times$4), highlighting the solutions proposed and the outcomes obtained. The challenge involves generating corresponding high-resolution (HR) images, magnified by a factor of four, from low-resolution (LR) inputs using prior information. The LR images originate from bicubic downsampling degradation. The aim of the challenge is to obtain designs/solutions with the most advanced SR performance, with no constraints on computational resources (e.g., model size and FLOPs) or training data. The track of this challenge assesses performance with the PSNR metric on the DIV2K testing dataset. The competition attracted 199 registrants, with 20 teams submitting valid entries. This collective endeavour not only pushes the boundaries of performance in single-image SR but also offers a comprehensive overview of current trends in this field.
本文回顾了NTIRE 2024图像超分辨率($\times$4)挑战,重点介绍所提出的解决方案和获得的结果。挑战包括利用先验信息生成相应的高分辨率(HR)图像,并将其放大四倍。LR图像来源于位减低分辨率。挑战的目的是获得具有最先进SR性能的设计/解决方案,无关于计算资源(例如模型大小和FLOPs)或训练数据。挑战的跟踪评估了DIV2K测试数据集上的PSNR指标的性能。竞争吸引了199个注册者,其中20个团队提交了有效的参赛作品。这一集体努力不仅推动了单图像SR性能的边界,而且为该领域提供了全面的趋势概述。
https://arxiv.org/abs/2404.09790
Knowledge distillation (KD) has emerged as a promising technique in deep learning, typically employed to enhance a compact student network through learning from their high-performance but more complex teacher variant. When applied in the context of image super-resolution, most KD approaches are modified versions of methods developed for other computer vision tasks, which are based on training strategies with a single teacher and simple loss functions. In this paper, we propose a novel Multi-Teacher Knowledge Distillation (MTKD) framework specifically for image super-resolution. It exploits the advantages of multiple teachers by combining and enhancing the outputs of these teacher models, which then guides the learning process of the compact student network. To achieve more effective learning performance, we have also developed a new wavelet-based loss function for MTKD, which can better optimize the training process by observing differences in both the spatial and frequency domains. We fully evaluate the effectiveness of the proposed method by comparing it to five commonly used KD methods for image super-resolution based on three popular network architectures. The results show that the proposed MTKD method achieves evident improvements in super-resolution performance, up to 0.46dB (based on PSNR), over state-of-the-art KD approaches across different network structures. The source code of MTKD will be made available here for public evaluation.
知识蒸馏(KD)在深度学习中已经成为一个有前景的技术,通常用于通过学习从其高性能但更复杂的教师变体来增强紧凑的学生网络。当应用于图像超分辨率时,大多数KD方法都是为其他计算机视觉任务开发的方法,基于单教师训练策略和简单的损失函数。在本文中,我们提出了一个名为多教师知识蒸馏(MTKD)的新框架,专门用于图像超分辨率。它利用了多个教师的优势,通过结合和增强这些教师模型的输出,从而引导紧凑学生网络的学习过程。为了获得更好的学习效果,我们还开发了一个基于小波的损失函数,该函数可以在空间和频率域中更好地优化训练过程。我们对五种最常用的KD方法进行了全面的评估,这些方法基于三种流行的网络架构。结果表明,与最先进的KD方法相比,MTKD方法在超分辨率性能上明显取得了改进,性能提高了0.46dB(基于PSNR)。MTKD的源代码将通过这里公开评估。
https://arxiv.org/abs/2404.09571
Volumetric biomedical microscopy has the potential to increase the diagnostic information extracted from clinical tissue specimens and improve the diagnostic accuracy of both human pathologists and computational pathology models. Unfortunately, barriers to integrating 3-dimensional (3D) volumetric microscopy into clinical medicine include long imaging times, poor depth / z-axis resolution, and an insufficient amount of high-quality volumetric data. Leveraging the abundance of high-resolution 2D microscopy data, we introduce masked slice diffusion for super-resolution (MSDSR), which exploits the inherent equivalence in the data-generating distribution across all spatial dimensions of biological specimens. This intrinsic characteristic allows for super-resolution models trained on high-resolution images from one plane (e.g., XY) to effectively generalize to others (XZ, YZ), overcoming the traditional dependency on orientation. We focus on the application of MSDSR to stimulated Raman histology (SRH), an optical imaging modality for biological specimen analysis and intraoperative diagnosis, characterized by its rapid acquisition of high-resolution 2D images but slow and costly optical z-sectioning. To evaluate MSDSR's efficacy, we introduce a new performance metric, SliceFID, and demonstrate MSDSR's superior performance over baseline models through extensive evaluations. Our findings reveal that MSDSR not only significantly enhances the quality and resolution of 3D volumetric data, but also addresses major obstacles hindering the broader application of 3D volumetric microscopy in clinical diagnostics and biomedical research.
体积生物医学显微镜具有从临床组织样本中提取更多诊断信息并提高人病理学家和计算病理学模型的诊断准确性的潜力。然而,将3维(3D)体积显微镜整合到临床医学中存在一些障碍,包括成像时间长、深度/z轴分辨率低和优质体积数据不足。利用高分辨率2D显微镜数据的丰富性,我们引入了遮罩切片扩散(MSDSR)用于超分辨率(MSDSR),该技术利用生物组织样本中所有空间维度数据生成分布的内在等价性。这种固有特性使得在同一平面上(例如XY)训练的具有高分辨率图像的超级分辨率模型能够有效地推广到其他方向(例如XZ和YZ),克服了传统依赖 orientation 的限制。我们将重点放在MSDSR在刺激 Raman 组织学(SRH)中的应用上,这是一种用于生物组织样品分析和术中诊断的光学成像模式,具有快速获取高分辨率2D图像的特点,但成像速度较慢、成本较高。为了评估MSDSR的效力,我们引入了一个新的性能指标——切片FID,并通过广泛的评估展示了MSDSR在基线模型上的优越性能。我们的研究结果表明,MSDSR不仅显著提高了3D体积数据的质量和分辨率,而且解决了阻碍3D体积显微镜在临床诊断和研究中的应用的主要障碍。
https://arxiv.org/abs/2404.09425
Pulmonary artery-vein segmentation is crucial for diagnosing pulmonary diseases and surgical planning, and is traditionally achieved by Computed Tomography Pulmonary Angiography (CTPA). However, concerns regarding adverse health effects from contrast agents used in CTPA have constrained its clinical utility. In contrast, identifying arteries and veins using non-contrast CT, a conventional and low-cost clinical examination routine, has long been considered impossible. Here we propose a High-abundant Pulmonary Artery-vein Segmentation (HiPaS) framework achieving accurate artery-vein segmentation on both non-contrast CT and CTPA across various spatial resolutions. HiPaS first performs spatial normalization on raw CT scans via a super-resolution module, and then iteratively achieves segmentation results at different branch levels by utilizing the low-level vessel segmentation as a prior for high-level vessel segmentation. We trained and validated HiPaS on our established multi-centric dataset comprising 1,073 CT volumes with meticulous manual annotation. Both quantitative experiments and clinical evaluation demonstrated the superior performance of HiPaS, achieving a dice score of 91.8% and a sensitivity of 98.0%. Further experiments demonstrated the non-inferiority of HiPaS segmentation on non-contrast CT compared to segmentation on CTPA. Employing HiPaS, we have conducted an anatomical study of pulmonary vasculature on 10,613 participants in China (five sites), discovering a new association between pulmonary vessel abundance and sex and age: vessel abundance is significantly higher in females than in males, and slightly decreases with age, under the controlling of lung volumes (p < 0.0001). HiPaS realizing accurate artery-vein segmentation delineates a promising avenue for clinical diagnosis and understanding pulmonary physiology in a non-invasive manner.
肺动脉-静脉分割对于诊断肺病和手术规划至关重要,传统上通过计算机断层扫描肺血管造影(CTPA)实现。然而,有关CTPA中使用的对比剂引起的 adverse health effects 使它的临床实用受到限制。相比之下,通过非对比增强CT(一种传统且成本较低的临床检查方法)来识别动脉和静脉多年来被认为是不可行的。 在这里,我们提出了一种高丰富度的肺动脉-静脉分割(HiPaS)框架,在非对比增强CT和CTPA的各种分辨率下实现精确的动脉-静脉分割。HiPaS首先通过超分辨率模块对原始CT扫描进行空间标准化,然后在不同分支级别上通过低级别血管分割作为高级别血管分割的预处理,实现分割结果。我们在中国对我们建立的多个中心数据集进行了训练和验证,该数据集包括1,073个CT volume。 HiPaS的定量实验和临床评估结果表明,HiPaS在实现准确动脉-静脉分割方面具有优越性能,达到91.8%的 dice score 和98.0%的敏感度。此外的实验结果表明,HiPaS在非对比增强CT上分割肺血管与在CTPA上分割肺血管的 non-inferiority。 利用HiPaS,我们在中国的10,613个参与者中进行了一次解剖研究(五个站点),发现了肺血管丰富性与性别和年龄之间的 new association:女性的肺血管丰富度显著高于男性,随着年龄的增长,肺血管丰富度略有下降,在控制肺容量的情况下(p < 0.0001)。HiPaS实现准确动脉-静脉分割揭示了通过非侵入性方法研究肺病和理解肺生理的有前景的途径。
https://arxiv.org/abs/2404.07671