As digital content becomes increasingly ubiquitous, the need for robust watermark removal techniques has grown due to the inadequacy of existing embedding techniques, which lack robustness. This paper introduces a novel Saliency-Aware Diffusion Reconstruction (SADRE) framework for watermark elimination on the web, combining adaptive noise injection, region-specific perturbations, and advanced diffusion-based reconstruction. SADRE disrupts embedded watermarks by injecting targeted noise into latent representations guided by saliency masks although preserving essential image features. A reverse diffusion process ensures high-fidelity image restoration, leveraging adaptive noise levels determined by watermark strength. Our framework is theoretically grounded with stability guarantees and achieves robust watermark removal across diverse scenarios. Empirical evaluations on state-of-the-art (SOTA) watermarking techniques demonstrate SADRE's superiority in balancing watermark disruption and image quality. SADRE sets a new benchmark for watermark elimination, offering a flexible and reliable solution for real-world web content. Code is available on~\href{this https URL}{\textbf{this https URL}}.
随着数字内容的普及,由于现有嵌入技术缺乏鲁棒性,去除水印的需求日益增加。本文介绍了一种新颖的基于显著性的扩散重建(Saliency-Aware Diffusion Reconstruction, SADRE)框架,用于在网络上消除水印。该框架结合了自适应噪声注入、特定区域扰动和先进的扩散式重构方法。通过在由显著性掩码引导的潜在表示中注入有针对性的噪声,SADRE能够破坏嵌入的水印,同时保留图像的基本特征。逆向扩散过程确保了高保真度的图像恢复,利用自适应噪声级别来确定水印强度。我们的框架具有理论上的稳定性保证,并在各种场景下实现了鲁棒的水印去除效果。实证评估表明,在最先进的(State-of-the-Art, SOTA)水印技术上,SADRE在破坏水印和保持图像质量方面表现出色。SADRE为水印消除设定了新的基准,提供了一种灵活且可靠的解决方案,适用于现实世界中的网络内容。 代码可以在[这里](this https URL)获取。
https://arxiv.org/abs/2504.12809
Restoring images afflicted by complex real-world degradations remains challenging, as conventional methods often fail to adapt to the unique mixture and severity of artifacts present. This stems from a reliance on indirect cues which poorly capture the true perceptual quality deficit. To address this fundamental limitation, we introduce AdaQual-Diff, a diffusion-based framework that integrates perceptual quality assessment directly into the generative restoration process. Our approach establishes a mathematical relationship between regional quality scores from DeQAScore and optimal guidance complexity, implemented through an Adaptive Quality Prompting mechanism. This mechanism systematically modulates prompt structure according to measured degradation severity: regions with lower perceptual quality receive computationally intensive, structurally complex prompts with precise restoration directives, while higher quality regions receive minimal prompts focused on preservation rather than intervention. The technical core of our method lies in the dynamic allocation of computational resources proportional to degradation severity, creating a spatially-varying guidance field that directs the diffusion process with mathematical precision. By combining this quality-guided approach with content-specific conditioning, our framework achieves fine-grained control over regional restoration intensity without requiring additional parameters or inference iterations. Experimental results demonstrate that AdaQual-Diff achieves visually superior restorations across diverse synthetic and real-world datasets.
修复受到复杂现实世界退化影响的图像仍然具有挑战性,因为传统方法往往无法适应存在的独特混合和严重程度的艺术瑕疵。这源于对间接线索的依赖,这些线索未能充分捕捉到真正的感知质量缺陷。为了解决这一根本限制,我们引入了AdaQual-Diff,这是一种基于扩散框架的方法,它将感知质量评估直接整合到了生成修复过程中。我们的方法建立了一个数学关系,将DeQAScore中的区域质量得分与最优引导复杂度联系起来,并通过自适应质量提示机制实现这一点。 这种机制根据测量到的退化严重程度系统地调节提示结构:感知质量较低的区域会收到计算密集型、结构复杂的提示,带有精确的修复指令;而高质量的区域则会收到最小化的提示,专注于保护而非干预。我们方法的技术核心在于动态分配与降级严重性成比例的计算资源,创建一个空间变化的引导场,以数学精度指导扩散过程。 通过将这种方法与内容特定条件相结合,我们的框架能够在不增加额外参数或推理迭代的情况下实现对区域修复强度的精细控制。实验结果表明,AdaQual-Diff在各种合成和现实世界的数据集中实现了视觉上更优的修复效果。
https://arxiv.org/abs/2504.12605
Medical image restoration tasks aim to recover high-quality images from degraded observations, exhibiting emergent desires in many clinical scenarios, such as low-dose CT image denoising, MRI super-resolution, and MRI artifact removal. Despite the success achieved by existing deep learning-based restoration methods with sophisticated modules, they struggle with rendering computationally-efficient reconstruction results. Moreover, they usually ignore the reliability of the restoration results, which is much more urgent in medical systems. To alleviate these issues, we present LRformer, a Lightweight Transformer-based method via Reliability-guided learning in the frequency domain. Specifically, inspired by the uncertainty quantification in Bayesian neural networks (BNNs), we develop a Reliable Lesion-Semantic Prior Producer (RLPP). RLPP leverages Monte Carlo (MC) estimators with stochastic sampling operations to generate sufficiently-reliable priors by performing multiple inferences on the foundational medical image segmentation model, MedSAM. Additionally, instead of directly incorporating the priors in the spatial domain, we decompose the cross-attention (CA) mechanism into real symmetric and imaginary anti-symmetric parts via fast Fourier transform (FFT), resulting in the design of the Guided Frequency Cross-Attention (GFCA) solver. By leveraging the conjugated symmetric property of FFT, GFCA reduces the computational complexity of naive CA by nearly half. Extensive experimental results in various tasks demonstrate the superiority of the proposed LRformer in both effectiveness and efficiency.
医学图像恢复任务旨在从退化观测中恢复高质量的图像,这在许多临床场景中表现出强烈的需求,例如低剂量CT成像去噪、MRI超分辨率重建和MRI伪影去除。尽管现有的基于深度学习的恢复方法通过复杂模块取得了成功,但它们在生成计算效率高的重建结果方面仍然面临挑战,并且通常忽视了恢复结果的可靠性,在医疗系统中这一点尤为关键。为了缓解这些问题,我们提出了LRformer,这是一种基于轻量级Transformer并通过频域引导学习来增强可靠性的方法。 具体来说,受贝叶斯神经网络(BNNs)中的不确定性量化启发,我们开发了一种可靠的病灶语义先验生成器(RLPP)。RLPP利用蒙特卡洛(MC)估计器和随机采样操作,在基础医学图像分割模型MedSAM上进行多次推理,以生成足够可靠的先验。 此外,不同于直接在空间域中引入这些先验信息,我们通过快速傅里叶变换(FFT)将交叉注意力(CA)机制分解为实对称部分和虚反对称部分,从而设计了引导频率交叉注意(GFCA)求解器。利用FFT的共轭对称性质,GFCA使原始CA的计算复杂度几乎减半。 在各种任务中进行的广泛实验结果表明,在有效性和效率方面,所提出的LRformer方法都具有优越性。
https://arxiv.org/abs/2504.11286
Current self-supervised methods, such as contrastive learning, predominantly focus on global discrimination, neglecting the critical fine-grained anatomical details required for accurate radiographic analysis. To address this challenge, we propose an Anatomy-driven self-supervised framework for enhancing Fine-grained Representation in radiographic image analysis (AFiRe). The core idea of AFiRe is to align the anatomical consistency with the unique token-processing characteristics of Vision Transformer. Specifically, AFiRe synergistically performs two self-supervised schemes: (i) Token-wise anatomy-guided contrastive learning, which aligns image tokens based on structural and categorical consistency, thereby enhancing fine-grained spatial-anatomical discrimination; (ii) Pixel-level anomaly-removal restoration, which particularly focuses on local anomalies, thereby refining the learned discrimination with detailed geometrical information. Additionally, we propose Synthetic Lesion Mask to enhance anatomical diversity while preserving intra-consistency, which is typically corrupted by traditional data augmentations, such as Cropping and Affine transformations. Experimental results show that AFiRe: (i) provides robust anatomical discrimination, achieving more cohesive feature clusters compared to state-of-the-art contrastive learning methods; (ii) demonstrates superior generalization, surpassing 7 radiography-specific self-supervised methods in multi-label classification tasks with limited labeling; and (iii) integrates fine-grained information, enabling precise anomaly detection using only image-level annotations.
当前的自监督方法,如对比学习,主要关注全局区分性,忽略了放射图像分析中所需的精细解剖细节。为了解决这一挑战,我们提出了一种以解剖学为导向的自监督框架AFiRe(Fine-grained Representation Enhancement for Radiographic Image Analysis),旨在增强放射图像中的细粒度表示能力。AFiRe的核心理念是将解剖学的一致性与视觉变换器的独特标记处理特性相结合。具体而言,AFiRe协同执行两种自我监督方案:(i) 基于结构和类别一致性的标记导向对比学习,该方法通过对图像标记进行对齐来增强细粒度的空间解剖区分能力;(ii) 专注于局部异常的像素级异常移除恢复,这种方法利用详细的几何信息细化所学的区分性。此外,我们还提出了一种合成病变掩码(Synthetic Lesion Mask),以在保持内部一致性的同时增强解剖多样性,这种多样性通常会因传统的数据增强方法(如裁剪和仿射变换)而受损。 实验结果显示,AFiRe具有以下优势:(i) 提供了稳健的解剖学区分能力,在多标签分类任务中相比最先进的对比学习方法提供了更为一致的特征簇;(ii) 显示出了卓越的泛化性能,在有限标记的情况下超越了7种放射图像特有的自监督方法;以及(iii) 集成了细粒度信息,仅使用图像级别的注释即可实现精确异常检测。
https://arxiv.org/abs/2504.10972
Image restoration~(IR), as a fundamental multimedia data processing task, has a significant impact on downstream visual applications. In recent years, researchers have focused on developing general-purpose IR models capable of handling diverse degradation types, thereby reducing the cost and complexity of model development. Current mainstream approaches are based on three architectural paradigms: CNNs, Transformers, and Mambas. CNNs excel in efficient inference, whereas Transformers and Mamba excel at capturing long-range dependencies and modeling global contexts. While each architecture has demonstrated success in specialized, single-task settings, limited efforts have been made to effectively integrate heterogeneous architectures to jointly address diverse IR challenges. To bridge this gap, we propose RestorMixer, an efficient and general-purpose IR model based on mixed-architecture fusion. RestorMixer adopts a three-stage encoder-decoder structure, where each stage is tailored to the resolution and feature characteristics of the input. In the initial high-resolution stage, CNN-based blocks are employed to rapidly extract shallow local features. In the subsequent stages, we integrate a refined multi-directional scanning Mamba module with a multi-scale window-based self-attention mechanism. This hierarchical and adaptive design enables the model to leverage the strengths of CNNs in local feature extraction, Mamba in global context modeling, and attention mechanisms in dynamic feature refinement. Extensive experimental results demonstrate that RestorMixer achieves leading performance across multiple IR tasks while maintaining high inference efficiency. The official code can be accessed at this https URL.
图像恢复(IR)作为一种基础的多媒体数据处理任务,对下游视觉应用产生了重要影响。近年来,研究人员致力于开发能够应对多种退化类型的通用型图像恢复模型,从而降低模型开发的成本和复杂性。当前主流的方法基于三种架构范式:卷积神经网络(CNNs)、变换器(Transformers)和曼巴结构(Mambas)。CNNs在高效推理方面表现出色,而变换器和曼巴结构则擅长捕捉长程依赖性和建模全局上下文。虽然每种架构都在专门的单一任务设置中展示了成功,但在有效融合异构架构以共同应对多种图像恢复挑战方面的努力却有限。为填补这一空白,我们提出了RestorMixer,这是一种基于混合架构融合的有效且通用型图像恢复模型。 RestorMixer采用了三阶段的编码器-解码器结构,每个阶段都根据输入的分辨率和特征特性进行定制化设计。在初始的高分辨率阶段,使用了基于CNN的块快速提取浅层局部特征。在其后的阶段中,我们整合了一个精炼的多方向扫描曼巴模块与一个多尺度窗口基元的自注意机制。这种分层且适应性的设计使模型能够利用CNN在局部特征提取方面的优势、曼巴在全球上下文建模中的优势以及注意力机制在动态特征细化中的作用。 广泛的实验结果表明,RestorMixer在多种图像恢复任务中取得了领先的性能,并保持了高效的推理效率。官方代码可在该链接获取:[此位置应填写实际的URL地址]。
https://arxiv.org/abs/2504.10967
Poisson-Gaussian noise describes the noise of various imaging systems thus the need of efficient algorithms for Poisson-Gaussian image restoration. Deep learning methods offer state-of-the-art performance but often require sensor-specific training when used in a supervised setting. A promising alternative is given by plug-and-play (PnP) methods, which consist in learning only a regularization through a denoiser, allowing to restore images from several sources with the same network. This paper introduces PG-DPIR, an efficient PnP method for high-count Poisson-Gaussian inverse problems, adapted from DPIR. While DPIR is designed for white Gaussian noise, a naive adaptation to Poisson-Gaussian noise leads to prohibitively slow algorithms due to the absence of a closed-form proximal operator. To address this, we adapt DPIR for the specificities of Poisson-Gaussian noise and propose in particular an efficient initialization of the gradient descent required for the proximal step that accelerates convergence by several orders of magnitude. Experiments are conducted on satellite image restoration and super-resolution problems. High-resolution realistic Pleiades images are simulated for the experiments, which demonstrate that PG-DPIR achieves state-of-the-art performance with improved efficiency, which seems promising for on-ground satellite processing chains.
泊松-高斯噪声描述了各种成像系统中的噪声特性,因此需要高效的算法来解决泊松-高斯图像恢复问题。深度学习方法提供了最先进的性能,但当在监督设置下使用时通常需要特定传感器的训练。一种有前景的替代方案是由插件播放(PnP)方法提供的,这些方法仅通过去噪器学习正则化项,从而能够利用同一网络从多个来源恢复图像。本文介绍了PG-DPIR,这是一种高效的PnP方法,用于解决高计数泊松-高斯逆问题,基于DPIR进行了改进。虽然DPIR是为白色高斯噪声设计的,但直接将其应用于泊松-高斯噪声会导致算法运行速度极其缓慢,因为缺乏封闭形式的近似算子。为了应对这一挑战,我们针对Poisson-Gaussian噪声的特点对DPIR进行了调整,并特别提出了一种高效的梯度下降初始化方法,用于加速近似步骤中的收敛速度,提高了几个数量级的速度。实验在卫星图像恢复和超分辨率问题上进行。利用高分辨率的现实Pleiades图像模拟了实验数据,结果表明PG-DPIR实现了最先进的性能并提高了效率,这似乎对于地面卫星处理链来说前景广阔。
https://arxiv.org/abs/2504.10375
Image restoration involves recovering high-quality images from their corrupted versions, requiring a nuanced balance between spatial details and contextual information. While certain methods address this balance, they predominantly emphasize spatial aspects, neglecting frequency variation comprehension. In this paper, we present a multi-scale design that optimally balances these competing objectives, seamlessly integrating spatial and frequency domain knowledge to selectively recover the most informative information. Specifically, we develop a hybrid scale frequency selection block (HSFSBlock), which not only captures multi-scale information from the spatial domain, but also selects the most informative components for image restoration in the frequency domain. Furthermore, to mitigate the inherent noise introduced by skip connections employing only addition or concatenation, we introduce a skip connection attention mechanism (SCAM) to selectively determines the information that should propagate through skip connections. The resulting tightly interlinked architecture, named as LCDNet. Extensive experiments conducted across diverse image restoration tasks showcase that our model attains performance levels that are either superior or comparable to those of state-of-the-art algorithms.
图像恢复涉及从受损版本中恢复高质量的图像,这需要在空间细节和上下文信息之间找到微妙的平衡。尽管某些方法解决了这种平衡问题,但它们主要强调了空间方面,忽视了频率变化的理解。在这篇论文中,我们提出了一种多尺度设计,该设计能够优化地平衡这些相互竞争的目标,将空间和频域的知识无缝融合在一起,以选择性地恢复最有信息量的信息。具体而言,我们开发了一个混合尺度频率选择块(HSFSBlock),它不仅从空间领域捕获了多层次的信息,而且还选择了在频域中进行图像恢复时最具有信息性的组件。 此外,为了缓解仅通过加法或串联操作的跳跃连接引入的固有噪声问题,我们引入了一种跳跃连接注意力机制(SCAM),以选择性地确定应该通过跳跃连接传递的信息。由此产生的紧密交织的架构被命名为LCDNet。在各种图像恢复任务上进行的广泛实验表明,我们的模型达到了优于或与现有最佳算法相当的性能水平。
https://arxiv.org/abs/2504.10558
Even though Deep Neural Networks are extremely powerful for image restoration tasks, they have several limitations. They are poorly understood and suffer from strong biases inherited from the training sets. One way to address these shortcomings is to have a better control over the training sets, in particular by using synthetic sets. In this paper, we propose a synthetic image generator relying on a few simple principles. In particular, we focus on geometric modeling, textures, and a simple modeling of image acquisition. These properties, integrated in a classical Dead Leaves model, enable the creation of efficient training sets. Standard image denoising and super-resolution networks can be trained on such datasets, reaching performance almost on par with training on natural image datasets. As a first step towards explainability, we provide a careful analysis of the considered principles, identifying which image properties are necessary to obtain good performances. Besides, such training also yields better robustness to various geometric and radiometric perturbations of the test sets.
尽管深度神经网络在图像恢复任务中极其强大,但它们也存在几个限制。这些模型理解起来非常困难,并且会从训练集中继承强烈的偏见。解决这些问题的一种方法是更好地控制训练集,特别是在使用合成数据集的情况下。在这篇论文中,我们提出了一种基于少数几项简单原则的合成图像生成器。具体而言,我们专注于几何建模、纹理以及对图像获取过程的简单模拟。这些特性集成到经典的Dead Leaves模型(一种用于建模不透明物体随机覆盖现象的概率模型)中,可以创建高效的训练集。标准的去噪和超分辨率网络可以在这样的数据集上进行训练,并且其性能几乎与在自然图像数据集上的训练效果相当。 为了向可解释性迈出第一步,我们对所考虑的原则进行了仔细分析,识别出哪些图像是获得良好表现所需要的特性。此外,这种训练方法还提高了测试集中各种几何和辐射学扰动的鲁棒性。
https://arxiv.org/abs/2504.10201
Diabetic retinopathy is a leading cause of vision impairment, making its early diagnosis through fundus imaging critical for effective treatment planning. However, the presence of poor quality fundus images caused by factors such as inadequate illumination, noise, blurring and other motion artifacts yields a significant challenge for accurate DR screening. In this study, we propose progressive transfer learning for multi pass restoration to iteratively enhance the quality of degraded fundus images, ensuring more reliable DR screening. Unlike previous methods that often focus on a single pass restoration, multi pass restoration via PTL can achieve a superior blind restoration performance that can even improve most of the good quality fundus images in the dataset. Initially, a Cycle GAN model is trained to restore low quality images, followed by PTL induced restoration passes over the latest restored outputs to improve overall quality in each pass. The proposed method can learn blind restoration without requiring any paired data while surpassing its limitations by leveraging progressive learning and fine tuning strategies to minimize distortions and preserve critical retinal features. To evaluate PTL's effectiveness on multi pass restoration, we conducted experiments on DeepDRiD, a large scale fundus imaging dataset specifically curated for diabetic retinopathy detection. Our result demonstrates state of the art performance, showcasing PTL's potential as a superior approach to iterative image quality restoration.
糖尿病性视网膜病变是导致视力损害的主要原因之一,因此通过眼底成像进行早期诊断对于有效的治疗计划至关重要。然而,由于照明不足、噪声、模糊以及其他运动伪影等原因造成的眼底图像质量差,给准确的DR筛查带来了重大挑战。在这项研究中,我们提出了一种基于渐进式迁移学习的多次迭代恢复方法,以逐步提升受损眼底图像的质量,从而确保更加可靠的DR筛查结果。 与以往专注于单一迭代恢复的方法不同,通过PTL实现的多次迭代恢复可以达到更优的无配对数据盲恢复性能,甚至能够改善数据集中大多数高质量的眼底图像。首先训练一个Cycle GAN模型来修复低质量的图像,随后在每次迭代中利用PTL诱导的恢复步骤改进最新一次恢复结果的整体质量。所提出的方法可以在不依赖任何成对数据的情况下学习无配对数据盲恢复,并通过渐进式学习和精细调优策略减少失真并保留关键视网膜特征。 为了评估PTL在多次迭代恢复中的有效性,我们在DeepDRiD上进行了实验,这是一个专门为糖尿病性视网膜病变检测而专门编纂的大规模眼底成像数据集。我们的结果展示了最先进的性能水平,并显示了PTL作为迭代图像质量恢复的优越方法的巨大潜力。
https://arxiv.org/abs/2504.10025
All-in-one image restoration, addressing diverse degradation types with a unified model, presents significant challenges in designing task-specific prompts that effectively guide restoration across multiple degradation scenarios. While adaptive prompt learning enables end-to-end optimization, it often yields overlapping or redundant task representations. Conversely, explicit prompts derived from pretrained classifiers enhance discriminability but may discard critical visual information for reconstruction. To address these limitations, we introduce Contrastive Prompt Learning (CPL), a novel framework that fundamentally enhances prompt-task alignment through two complementary innovations: a \emph{Sparse Prompt Module (SPM)} that efficiently captures degradation-specific features while minimizing redundancy, and a \emph{Contrastive Prompt Regularization (CPR)} that explicitly strengthens task boundaries by incorporating negative prompt samples across different degradation types. Unlike previous approaches that focus primarily on degradation classification, CPL optimizes the critical interaction between prompts and the restoration model itself. Extensive experiments across five comprehensive benchmarks demonstrate that CPL consistently enhances state-of-the-art all-in-one restoration models, achieving significant improvements in both standard multi-task scenarios and challenging composite degradation settings. Our framework establishes new state-of-the-art performance while maintaining parameter efficiency, offering a principled solution for unified image restoration.
针对多种退化类型进行统一建模的一站式图像修复任务,在设计能够有效指导跨多场景修复的任务特定提示方面面临着重大挑战。虽然自适应提示学习可以实现端到端优化,但往往会生成重叠或冗余的任务表示形式。相反,基于预训练分类器得出的显式提示增强了辨别能力,却可能忽略重构所需的视觉关键信息。为了解决这些问题,我们引入了对比性提示学习(Contrastive Prompt Learning, CPL),这是一种通过两项互补创新来根本上增强提示与任务对齐的新框架:一种是有效捕捉特定退化特征的同时最小化冗余的稀疏提示模块(Sparse Prompt Module, SPM);另一种则是通过纳入不同退化类型中的负样本,明确加强任务边界的对比性提示正则化(Contrastive Prompt Regularization, CPR)。与以往主要关注于退化分类的方法相比,CPL优化了提示和修复模型之间的关键交互。在五个全面基准测试中进行的广泛实验表明,CPL能够持续提升现有的一站式图像修复模型,在标准多任务场景以及复杂混合退化设置中均取得了显著改进。我们的框架在保持参数效率的同时建立了新的性能标杆,并为统一图像修复提供了一种原理性的解决方案。
https://arxiv.org/abs/2504.09973
Real-world image de-weathering aims at removingvarious undesirable weather-related artifacts, e.g., rain, snow,and fog. To this end, acquiring ideal training pairs is this http URL real-world datasets are typically constructed paired databy extracting clean and degraded images from live streamsof landscape scene on the Internet. Despite the use of strictfiltering mechanisms during collection, training pairs inevitablyencounter inconsistency in terms of lighting, object position, scenedetails, etc, making de-weathering models possibly suffer fromdeformation artifacts under non-ideal supervision. In this work,we propose a unified solution for real-world image de-weatheringwith non-ideal supervision, i.e., a pseudo-label guided learningframework, to address various inconsistencies within the realworld paired dataset. Generally, it consists of a de-weatheringmodel (De-W) and a Consistent Label Constructor (CLC), bywhich restoration result can be adaptively supervised by originalground-truth image to recover sharp textures while maintainingconsistency with the degraded inputs in non-weather contentthrough the supervision of pseudo-labels. Particularly, a Crossframe Similarity Aggregation (CSA) module is deployed withinCLC to enhance the quality of pseudo-labels by exploring thepotential complementary information of multi-frames throughgraph model. Moreover, we introduce an Information AllocationStrategy (IAS) to integrate the original ground-truth imagesand pseudo-labels, thereby facilitating the joint supervision forthe training of de-weathering model. Extensive experimentsdemonstrate that our method exhibits significant advantageswhen trained on imperfectly aligned de-weathering datasets incomparison with other approaches.
现实世界中的图像去天气化(即去除各种与天气相关的不希望出现的元素,例如雨、雪和雾)的目标是获取理想化的训练对。然而,在构建这些数据集时,通常是从互联网上的现场景观直播中提取干净和退化的图像来创建成对的数据。尽管在收集过程中采用了严格的过滤机制,但由于光照、物体位置、场景细节等方面的不一致性,训练对不可避免地会出现问题,导致去天气化模型在非理想监督下可能遭受形变伪影的影响。 为此,我们提出了一种针对现实世界中带有非理想监督的图像去天气化的统一解决方案——即基于伪标签引导的学习框架。该方案旨在解决真实数据集中存在的各种不一致性问题。总体而言,它包含一个去天气化模型(De-W)和一个一致标签构造器(CLC),通过这种方式,恢复结果可以由原始的真实地面实况图进行自适应监督以恢复清晰的纹理,并且在非天气内容方面通过伪标签保持与退化输入的一致性。特别地,在CLC中部署了一个跨帧相似度聚合模块(CSA)来通过图模型探索多帧间的潜在互补信息,从而提高伪标签的质量。此外,我们还引入了一种信息分配策略(IAS),以整合原始地面实况图像和伪标签,以便为去天气化模型的训练提供联合监督。 广泛的实验表明,与其它方法相比,在不完美对齐的去天气化数据集上进行训练时,我们的方法表现出显著的优势。
https://arxiv.org/abs/2504.09949
Transformer-based models have achieved remarkable success in natural language and vision tasks, but their application to gene expression analysis remains limited due to data sparsity, high dimensionality, and missing values. We present GexBERT, a transformer-based autoencoder framework for robust representation learning of gene expression data. GexBERT learns context-aware gene embeddings by pretraining on large-scale transcriptomic profiles with a masking and restoration objective that captures co-expression relationships among thousands of genes. We evaluate GexBERT across three critical tasks in cancer research: pan-cancer classification, cancer-specific survival prediction, and missing value imputation. GexBERT achieves state-of-the-art classification accuracy from limited gene subsets, improves survival prediction by restoring expression of prognostic anchor genes, and outperforms conventional imputation methods under high missingness. Furthermore, its attention-based interpretability reveals biologically meaningful gene patterns across cancer types. These findings demonstrate the utility of GexBERT as a scalable and effective tool for gene expression modeling, with translational potential in settings where gene coverage is limited or incomplete.
基于Transformer的模型在自然语言和视觉任务中取得了显著的成功,但在基因表达分析中的应用仍受到数据稀疏性、高维度以及缺失值的影响而受限。我们提出了GexBERT,这是一种面向基因表达数据分析的鲁棒表示学习框架,基于Transformer的自动编码器。通过大规模转录组谱型的预训练,并采用掩码和恢复目标的方式,GexBERT能够捕捉数千种基因之间的共表达关系,从而学习上下文感知的基因嵌入。 我们在癌症研究中的三个关键任务上评估了GexBERT:泛癌分类、特定癌症生存预测以及缺失值插补。GexBERT在有限基因子集的情况下实现了最先进的分类准确性;通过恢复预后锚定基因的表达来改进生存预测,并且在高缺失率下超过了传统的插补方法。此外,其基于注意力的可解释性揭示了跨不同癌症类型中具有生物学意义的基因模式。 这些发现证明了GexBERT作为一种可用于基因表达建模的可扩展和有效工具的价值,在基因覆盖率有限或不完整的情况下也展示了转化潜力。
https://arxiv.org/abs/2504.09704
Large-scale pre-trained diffusion models have produced excellent results in the field of conditional image generation. However, restoration of ancient murals, as an important downstream task in this field, poses significant challenges to diffusion model-based restoration methods due to its large defective area and scarce training samples. Conditional restoration tasks are more concerned with whether the restored part meets the aesthetic standards of mural restoration in terms of overall style and seam detail, and such metrics for evaluating heuristic image complements are lacking in current research. We therefore propose DiffuMural, a combined Multi-scale convergence and Collaborative Diffusion mechanism with ControlNet and cyclic consistency loss to optimise the matching between the generated images and the conditional control. DiffuMural demonstrates outstanding capabilities in mural restoration, leveraging training data from 23 large-scale Dunhuang murals that exhibit consistent visual aesthetics. The model excels in restoring intricate details, achieving a coherent overall appearance, and addressing the unique challenges posed by incomplete murals lacking factual grounding. Our evaluation framework incorporates four key metrics to quantitatively assess incomplete murals: factual accuracy, textural detail, contextual semantics, and holistic visual coherence. Furthermore, we integrate humanistic value assessments to ensure the restored murals retain their cultural and artistic significance. Extensive experiments validate that our method outperforms state-of-the-art (SOTA) approaches in both qualitative and quantitative metrics.
大规模预训练扩散模型在条件图像生成领域取得了卓越成果。然而,古代壁画的修复作为这一领域的关键下游任务,由于其大面积损坏和稀少的训练样本,对基于扩散模型的方法提出了重大挑战。条件性恢复任务更关注于修复部分是否符合壁画修复的整体风格和接缝细节的美学标准,而目前的研究中缺乏评估此类启发式图像补充效果的度量指标。 为此,我们提出了一种结合多尺度收敛与协作扩散机制、ControlNet以及循环一致性损失的方法——DiffuMural,以优化生成图像与条件控制之间的匹配。通过使用23幅具有连贯视觉美感的大规模敦煌壁画训练数据集,DiffuMural在壁画修复方面展现了卓越的能力,在细节恢复、整体一致性和处理缺乏事实依据的不完整壁画的独特挑战上尤为突出。 我们的评估框架包括四项关键指标,用于定量评价不完整的壁画:事实准确性、纹理细节、上下文语义和全局视觉一致性。此外,我们还整合了人文价值评估以确保修复后的壁画能够保留其文化和艺术意义。 广泛的实验验证表明,在定性和定量度量标准上,我们的方法均优于最先进的(SOTA)方法。
https://arxiv.org/abs/2504.09513
All-in-one image restoration, which aims to address diverse degradations within a unified framework, is critical for practical applications. However, existing methods rely on predicting and integrating degradation conditions, which can misactivate degradation-specific features in complex scenarios, limiting their restoration performance. To address this issue, we propose a novel all-in-one image restoration framework guided by Histograms of Oriented Gradients (HOG), named HOGformer. By leveraging the degradation-discriminative capability of HOG descriptors, HOGformer employs a dynamic self-attention mechanism that adaptively attends to long-range spatial dependencies based on degradation-aware HOG cues. To enhance the degradation sensitivity of attention inputs, we design a HOG-guided local dynamic-range convolution module that captures long-range degradation similarities while maintaining awareness of global structural information. Furthermore, we propose a dynamic interaction feed-forward module, efficiently increasing the model capacity to adapt to different degradations through channel-spatial interactions. Extensive experiments across diverse benchmarks, including adverse weather and natural degradations, demonstrate that HOGformer achieves state-of-the-art performance and generalizes effectively to complex real-world degradations. Code is available at this https URL.
一站式图像恢复旨在通过统一框架解决各种退化问题,对于实际应用至关重要。然而,现有的方法依赖于预测和整合退化条件,这在复杂场景中可能会误激活特定退化的特征,从而限制了它们的恢复性能。为了解决这一问题,我们提出了一种由方向梯度直方图(HOG)指导的新的一站式图像恢复框架,命名为HOGformer。 通过利用HOG描述符的降级辨别能力,HOGformer采用了一个动态自注意力机制,该机制可以根据基于退化感知的HOG线索自适应地关注长距离空间依赖关系。为了增强注意力输入对退化的敏感性,我们设计了一种HOG指导下的局部动态范围卷积模块,能够捕捉长距离退化相似性的同时保持全局结构信息的意识。 此外,我们还提出了一种动态交互前馈模块,通过通道-空间相互作用有效地增加模型容量以适应不同的退化情况。在包括恶劣天气和自然降级在内的多个基准测试中的广泛实验表明,HOGformer实现了最先进的性能,并且能够有效推广到复杂的现实世界降级情况中。 代码可在上述提供的链接获取。
https://arxiv.org/abs/2504.09377
Video imaging is often affected by complex degradations such as blur, noise, and compression artifacts. Traditional restoration methods follow a "single-task single-model" paradigm, resulting in poor generalization and high computational cost, limiting their applicability in real-world scenarios with diverse degradation types. We propose UniFlowRestore, a general video restoration framework that models restoration as a time-continuous evolution under a prompt-guided and physics-informed vector field. A physics-aware backbone PhysicsUNet encodes degradation priors as potential energy, while PromptGenerator produces task-relevant prompts as momentum. These components define a Hamiltonian system whose vector field integrates inertial dynamics, decaying physical gradients, and prompt-based guidance. The system is optimized via a fixed-step ODE solver to achieve efficient and unified restoration across tasks. Experiments show that UniFlowRestore delivers stateof-the-art performance with strong generalization and efficiency. Quantitative results demonstrate that UniFlowRestore achieves state-of-the-art performance, attaining the highest PSNR (33.89 dB) and SSIM (0.97) on the video denoising task, while maintaining top or second-best scores across all evaluated tasks.
视频成像经常受到诸如模糊、噪声和压缩伪影等复杂退化的影响。传统恢复方法遵循“单一任务单一模型”的模式,导致泛化能力差且计算成本高,在现实世界中面对多种退化类型时应用受限。我们提出了UniFlowRestore,这是一种通用的视频修复框架,它将恢复过程建模为在提示引导和物理信息向量场下的时间连续演化。一个物理感知骨干网路PhysicsUNet以势能的形式编码降级先验,而PromptGenerator生成与任务相关的动量提示。这些组件定义了一个哈密顿系统,该系统的向量场整合了惯性动力学、衰减的物理梯度和基于提示的指导。通过固定步长的ODE(常微分方程)求解器优化此系统,以实现跨任务的有效统一恢复。实验表明,UniFlowRestore在泛化能力和效率方面表现卓越,并达到了最先进的性能。定量结果显示,UniFlowRestore在视频去噪任务中取得了最佳的PSNR(33.89 dB)和SSIM(0.97),并且在所有评估任务中的得分均处于第一或第二的位置。
https://arxiv.org/abs/2504.09069
Recent progress in generative models has significantly improved image restoration capabilities, particularly through powerful diffusion models that offer remarkable recovery of semantic details and local fidelity. However, deploying these models at ultra-high resolutions faces a critical trade-off between quality and efficiency due to the computational demands of long-range attention mechanisms. To address this, we introduce ZipIR, a novel framework that enhances efficiency, scalability, and long-range modeling for high-res image restoration. ZipIR employs a highly compressed latent representation that compresses image 32x, effectively reducing the number of spatial tokens, and enabling the use of high-capacity models like the Diffusion Transformer (DiT). Toward this goal, we propose a Latent Pyramid VAE (LP-VAE) design that structures the latent space into sub-bands to ease diffusion training. Trained on full images up to 2K resolution, ZipIR surpasses existing diffusion-based methods, offering unmatched speed and quality in restoring high-resolution images from severely degraded inputs.
最近在生成模型方面的进展显著提升了图像修复能力,特别是通过强大的扩散模型提供了语义细节和局部保真度的出色恢复。然而,在超高清分辨率下部署这些模型时,由于长程注意力机制对计算资源的需求,面临着质量和效率之间的关键权衡。为解决这一问题,我们引入了ZipIR,这是一种新的框架,它增强了高效性、可扩展性和长期建模能力,以支持高分辨率图像修复。 ZipIR采用了一种高度压缩的潜在表示方法,将图像缩小32倍,有效地减少了空间令牌的数量,并允许使用如Diffusion Transformer(DiT)这样的高容量模型。为此,我们提出了一种潜伏金字塔变分自动编码器(LP-VAE)设计,它将潜在空间组织成子带,以便简化扩散训练过程。在高达2K分辨率的完整图像上进行训练后,ZipIR超越了现有的基于扩散的方法,在从严重退化的输入中恢复高分辨率图像的速度和质量方面提供了无与伦比的表现。
https://arxiv.org/abs/2504.08591
Image restoration is critical for improving the quality of degraded images, which is vital for applications like autonomous driving, security surveillance, and digital content enhancement. However, existing methods are often tailored to specific degradation scenarios, limiting their adaptability to the diverse and complex challenges in real-world environments. Moreover, real-world degradations are typically non-uniform, highlighting the need for adaptive and intelligent solutions. To address these issues, we propose a novel vision-language-guided universal restoration (VL-UR) framework. VL-UR leverages a zero-shot contrastive language-image pre-training (CLIP) model to enhance image restoration by integrating visual and semantic information. A scene classifier is introduced to adapt CLIP, generating high-quality language embeddings aligned with degraded images while predicting degraded types for complex scenarios. Extensive experiments across eleven diverse degradation settings demonstrate VL-UR's state-of-the-art performance, robustness, and adaptability. This positions VL-UR as a transformative solution for modern image restoration challenges in dynamic, real-world environments.
图像恢复对于提升受损图像的质量至关重要,这对于自动驾驶、安全监控和数字内容增强等应用来说非常关键。然而,现有的方法往往针对特定的退化场景进行定制,这限制了它们在现实世界复杂多变环境中的适应性。此外,实际世界的退化通常是不均匀的,这就凸显了需要灵活且智能解决方案的重要性。为了解决这些问题,我们提出了一种新颖的视觉-语言引导通用恢复(VL-UR)框架。 VL-UR利用零样本对比式语言图像预训练(CLIP)模型来通过整合视觉和语义信息增强图像恢复过程。引入了一个场景分类器以适应性地使用CLIP,生成与受损图像高度匹配且高质量的语言嵌入,并预测复杂情况下的退化类型。广泛的实验在十一种不同的退化设置下证明了VL-UR的领先性能、鲁棒性和适应能力。 这一框架被视为解决现代动态环境中的图像恢复挑战的一种变革性的解决方案。
https://arxiv.org/abs/2504.08219
Restoring severely blurred images remains a significant challenge in computer vision, impacting applications in autonomous driving, medical imaging, and photography. This paper introduces a novel training strategy based on curriculum learning to improve the robustness of deep learning models for extreme image deblurring. Unlike conventional approaches that train on only low to moderate blur levels, our method progressively increases the difficulty by introducing images with higher blur severity over time, allowing the model to adapt incrementally. Additionally, we integrate perceptual and hinge loss during training to enhance fine detail restoration and improve training stability. We experimented with various curriculum learning strategies and explored the impact of the train-test domain gap on the deblurring performance. Experimental results on the Extreme-GoPro dataset showed that our method outperforms the next best method by 14% in SSIM, whereas experiments on the Extreme-KITTI dataset showed that our method outperforms the next best by 18% in SSIM. Ablation studies showed that a linear curriculum progression outperforms step-wise, sigmoid, and exponential progressions, while hyperparameter settings such as the training blur percentage and loss function formulation all play important roles in addressing extreme blur artifacts. Datasets and code are available at this https URL
恢复严重模糊的图像在计算机视觉领域仍是一个重大挑战,影响着自动驾驶、医学成像和摄影等应用。本文介绍了一种基于课程学习(curriculum learning)的新颖训练策略,旨在提升深度学习模型在极端图像去模糊任务中的鲁棒性。与传统的仅针对低至中度模糊水平进行训练的方法不同,我们的方法通过逐步增加难度并引入不同程度的严重模糊图像来使模型能够渐进式地适应挑战。此外,我们还在训练过程中集成了感知损失(perceptual loss)和铰链损失(hinge loss),以增强细节恢复能力,并提高训练稳定性。 我们在多种课程学习策略上进行了实验,并探讨了训练与测试领域差距对去模糊性能的影响。在Extreme-GoPro数据集上的实验结果显示,我们的方法比第二好的方法SSIM分数高出14%,而在Extreme-KITTI数据集上的实验则显示,我们的方法比第二好的方法SSIM分数高出18%。 消融研究(ablation studies)表明,线性课程进展优于阶梯式、Sigmoid和指数级的进展方式。同时,训练模糊百分比和损失函数的形式化等超参数设置在应对极端模糊失真方面都扮演着重要角色。 数据集和代码可在以下链接获取:[此URL](https://this-url.com)(请将此处的实际URL替换为实际可用的数据集和代码下载链接)。
https://arxiv.org/abs/2504.08072
Deep learning (DL), a pivotal technology in artificial intelligence, has recently gained substantial traction in the domain of dental auxiliary diagnosis. However, its application has predominantly been confined to imaging modalities such as panoramic radiographs and Cone Beam Computed Tomography, with limited focus on auxiliary analysis specifically targeting Periapical Radiographs (PR). PR are the most extensively utilized imaging modality in endodontics and periodontics due to their capability to capture detailed local lesions at a low cost. Nevertheless, challenges such as resolution limitations and artifacts complicate the annotation and recognition of PR, leading to a scarcity of publicly available, large-scale, high-quality PR analysis datasets. This scarcity has somewhat impeded the advancement of DL applications in PR analysis. In this paper, we present PRAD-10K, a dataset for PR analysis. PRAD-10K comprises 10,000 clinical periapical radiograph images, with pixel-level annotations provided by professional dentists for nine distinct anatomical structures, lesions, and artificial restorations or medical devices, We also include classification labels for images with typical conditions or lesions. Furthermore, we introduce a DL network named PRNet to establish benchmarks for PR segmentation tasks. Experimental results demonstrate that PRNet surpasses previous state-of-the-art medical image segmentation models on the PRAD-10K dataset. The codes and dataset will be made publicly available.
深度学习(DL)作为人工智能领域的一项关键技术,最近在牙科辅助诊断中获得了显著的应用。然而,其应用主要集中在如全景X光片和锥形束计算机断层扫描等成像模式上,而在针对根尖片(PR)的辅助分析方面关注较少。根尖片是牙髓病学和牙周病学中最广泛使用的成像方式,因其能够以低成本捕捉到局部病变的详细情况。然而,分辨率限制及伪影等问题使得根尖片的标注和识别变得复杂,并导致缺乏大规模、高质量的公开可用根尖片分析数据集,这在一定程度上阻碍了深度学习技术在根尖片分析中的应用发展。 本文中我们介绍了一项名为PRAD-10K的数据集,用于根尖片分析。PRAD-10K包含10,000张临床根尖X光图像,并由专业牙医提供了九种不同解剖结构、病变及人工修复或医疗设备的像素级别标注信息。此外,我们还为具有典型状况或病变的图像包括了分类标签。另外,我们引入了一个名为PRNet的深度学习网络,旨在建立根尖片分割任务的标准。 实验结果表明,在PRAD-10K数据集上,我们的PRNet模型超越了当前医学影像分割领域的最先进水平。代码和数据集将公开发布供研究使用。
https://arxiv.org/abs/2504.07760
Neural representations for video (NeRV) have gained considerable attention for their strong performance across various video tasks. However, existing NeRV methods often struggle to capture fine spatial details, resulting in vague reconstructions. In this paper, we present a Frequency Separation and Augmentation based Neural Representation for video (FANeRV), which addresses these limitations with its core Wavelet Frequency Upgrade this http URL block explicitly separates input frames into high and low-frequency components using discrete wavelet transform, followed by targeted enhancement using specialized modules. Finally, a specially designed gated network effectively fuses these frequency components for optimal reconstruction. Additionally, convolutional residual enhancement blocks are integrated into the later stages of the network to balance parameter distribution and improve the restoration of high-frequency details. Experimental results demonstrate that FANeRV significantly improves reconstruction performance and excels in multiple tasks, including video compression, inpainting, and interpolation, outperforming existing NeRV methods.
基于神经表示的视频方法(NeRV)因其在各种视频任务中的出色表现而获得了广泛的关注。然而,现有的NeRV方法常常难以捕捉精细的空间细节,导致重建效果模糊不清。为此,在本文中我们提出了一种新的方法——基于频率分离和增强的神经视频表示(FANeRV),它通过核心的小波频谱升级模块解决了这些问题。 FANeRV的核心在于其采用了离散小波变换来将输入帧显式地分离为高频与低频成分,随后利用专门设计的模块对特定成分进行针对性提升。最终,一个特别设计的门控网络有效融合了这些频率组件以实现最优重建效果。此外,在网络后期阶段还集成了卷积残差增强块,这有助于平衡参数分布并进一步改善高频细节的恢复。 实验结果表明,FANeRV在视频压缩、修复和插值等多任务上显著提升了重建性能,并超越了现有的NeRV方法的表现。
https://arxiv.org/abs/2504.06755