Existing image restoration approaches typically employ extensive networks specifically trained for designated degradations. Despite being effective, such methods inevitably entail considerable storage costs and computational overheads due to the reliance on task-specific networks. In this work, we go beyond this well-established framework and exploit the inherent commonalities among image restoration tasks. The primary objective is to identify components that are shareable across restoration tasks and augment the shared components with modules specifically trained for individual tasks. Towards this goal, we propose AdaIR, a novel framework that enables low storage cost and efficient training without sacrificing performance. Specifically, a generic restoration network is first constructed through self-supervised pre-training using synthetic degradations. Subsequent to the pre-training phase, adapters are trained to adapt the pre-trained network to specific degradations. AdaIR requires solely the training of lightweight, task-specific modules, ensuring a more efficient storage and training regimen. We have conducted extensive experiments to validate the effectiveness of AdaIR and analyze the influence of the pre-training strategy on discovering shareable components. Extensive experimental results show that AdaIR achieves outstanding results on multi-task restoration while utilizing significantly fewer parameters (1.9 MB) and less training time (7 hours) for each restoration task. The source codes and trained models will be released.
现有的图像修复方法通常采用专门为指定贬值而训练的广泛网络。尽管这些方法有效,但由于依赖任务特定的网络,这种方法不可避免地导致相当大的存储成本和计算开销。在本文中,我们超越了这个经过充分验证的框架,并探讨了图像修复任务中固有的共同点。主要目标是确定可以在多个修复任务中共享的组件,并针对每个任务专门训练模块。为了实现这一目标,我们提出了AdaIR,一种新颖的框架,可以在不牺牲性能的情况下实现低存储成本和高效训练。具体来说,通过使用合成降噪进行自监督预训练,构建了一个通用的修复网络。在预训练阶段之后,我们训练适配器将预训练网络适应特定的降噪。AdaIR仅需要对轻量级、任务特定的模块进行训练,从而确保更高效的存储和训练计划。我们进行了广泛的实验来验证AdaIR的有效性并分析预训练策略对发现可共享组件的影响。大量的实验结果表明,AdaIR在多任务修复方面取得了出色的成绩,同时使用显著更少的参数(1.9 MB)和更短的学习时间(7小时)来完成每个修复任务。源代码和训练好的模型将发布。
https://arxiv.org/abs/2404.11475
Video Frame Interpolation (VFI) is a crucial technique in various applications such as slow-motion generation, frame rate conversion, video frame restoration etc. This paper introduces an efficient video frame interpolation framework that aims to strike a favorable balance between efficiency and quality. Our framework follows a general paradigm consisting of a flow estimator and a refinement module, while incorporating carefully designed components. First of all, we adopt depth-wise convolution with large kernels in the flow estimator that simultaneously reduces the parameters and enhances the receptive field for encoding rich context and handling complex motion. Secondly, diverging from a common design for the refinement module with a UNet-structure (encoder-decoder structure), which we find redundant, our decoder-only refinement module directly enhances the result from coarse to fine features, offering a more efficient process. In addition, to address the challenge of handling high-definition frames, we also introduce an innovative HD-aware augmentation strategy during training, leading to consistent enhancement on HD images. Extensive experiments are conducted on diverse datasets, Vimeo90K, UCF101, Xiph and SNU-FILM. The results demonstrate that our approach achieves state-of-the-art performance with clear improvement while requiring much less FLOPs and parameters, reaching to a better spot for balancing efficiency and quality.
视频帧插值(VFI)是各种应用(如慢动作生成、帧率转换、视频帧恢复等)中的关键技术。本文介绍了一种高效的视频帧插值框架,旨在在效率和质量之间取得良好的平衡。我们的框架包括一个流估计算法和一个优化模块,并精心设计了一些组件。首先,我们采用大尺寸的卷积来减少参数并增强编码丰富语境和处理复杂运动的能力。其次,从常见的优化模块设计(我们发现它是冗余的)中进行差异,我们的仅解码器优化模块直接增强从粗到细的特征,实现更高效的过程。此外,为了处理高清晰度帧,我们在训练过程中引入了一种创新的高清度增强策略,在HD图像上实现一致的增强。我们在多种数据集(Vimeo90K、UCF101、Xiph和SNU-FILM)上进行了广泛的实验。结果表明,我们的方法在具有显着提高的同时需要更少的FLOPs和参数,达到更好的平衡点,实现最高性能。
https://arxiv.org/abs/2404.11108
In this paper, we address the Bracket Image Restoration and Enhancement (BracketIRE) task using a novel framework, which requires restoring a high-quality high dynamic range (HDR) image from a sequence of noisy, blurred, and low dynamic range (LDR) multi-exposure RAW inputs. To overcome this challenge, we present the IREANet, which improves the multiple exposure alignment and aggregation with a Flow-guide Feature Alignment Module (FFAM) and an Enhanced Feature Aggregation Module (EFAM). Specifically, the proposed FFAM incorporates the inter-frame optical flow as guidance to facilitate the deformable alignment and spatial attention modules for better feature alignment. The EFAM further employs the proposed Enhanced Residual Block (ERB) as a foundational component, wherein a unidirectional recurrent network aggregates the aligned temporal features to better reconstruct the results. To improve model generalization and performance, we additionally employ the Bayer preserving augmentation (BayerAug) strategy to augment the multi-exposure RAW inputs. Our experimental evaluations demonstrate that the proposed IREANet shows state-of-the-art performance compared with previous methods.
在本文中,我们使用一种新框架来解决Bracket Image Restoration and Enhancement(BracketIRE)任务,该框架需要从噪声、模糊和低动态范围(LDR)的多曝光RAW输入序列中恢复高质量的高动态范围(HDR)图像。为了克服这一挑战,我们提出了IReadNet,它通过引入流量引导特征对齐模块(FFAM)和增强特征聚合模块(EFAM)来改善多曝光对齐和聚合。具体来说,所提出的FFAM利用跨帧光流作为指导,以促进可变形对齐和空间注意模块(更好的特征对齐),而EFAM则进一步采用提出的增强残差块(ERB)作为基本组件,其中单向递归网络聚集对齐的时空特征以更好地重构结果。为了提高模型的泛化能力和性能,我们还使用Bayer preserving augmentation(BayerAug)策略来增强多曝光RAW输入。我们的实验评估结果表明,与以前的方法相比,所提出的IReadNet显示出最先进的性能。
https://arxiv.org/abs/2404.10358
In reality, images often exhibit multiple degradations, such as rain and fog at night (triple degradations). However, in many cases, individuals may not want to remove all degradations, for instance, a blurry lens revealing a beautiful snowy landscape (double degradations). In such scenarios, people may only desire to deblur. These situations and requirements shed light on a new challenge in image restoration, where a model must perceive and remove specific degradation types specified by human commands in images with multiple degradations. We term this task Referring Flexible Image Restoration (RFIR). To address this, we first construct a large-scale synthetic dataset called RFIR, comprising 153,423 samples with the degraded image, text prompt for specific degradation removal and restored image. RFIR consists of five basic degradation types: blur, rain, haze, low light and snow while six main sub-categories are included for varying degrees of degradation removal. To tackle the challenge, we propose a novel transformer-based multi-task model named TransRFIR, which simultaneously perceives degradation types in the degraded image and removes specific degradation upon text prompt. TransRFIR is based on two devised attention modules, Multi-Head Agent Self-Attention (MHASA) and Multi-Head Agent Cross Attention (MHACA), where MHASA and MHACA introduce the agent token and reach the linear complexity, achieving lower computation cost than vanilla self-attention and cross-attention and obtaining competitive performances. Our TransRFIR achieves state-of-the-art performances compared with other counterparts and is proven as an effective architecture for image restoration. We release our project at this https URL.
在现实生活中,图像通常表现出多种降噪,例如夜间(三重降噪)。然而,在许多情况下,个人可能不想移除所有降噪,例如,一个模糊的镜头揭示了一个美丽的雪景(双重降噪)。在这些场景中,人们只想去模糊。这些情况和需求阐明了图像修复领域的一个新挑战,即模型必须通过图像中的特定降噪类型来感知并移除。我们称之为 Referring Flexible Image Restoration (RFIR) 任务。为解决这个挑战,我们首先构建了一个名为 RFIR 的大规模合成数据集,包括 153,423 个样本,其中有损坏图像、特定降噪的文本提示和修复图像。RFIR 包括五种基本降噪类型:模糊、雨、雾、低光和雪,同时包括六种主要降噪子类别,以不同程度地移除降噪。为了应对这个挑战,我们提出了一个基于 transformer 的多任务模型,名为 TransRFIR,它同时感知损坏图像中的降噪类型并在文本提示上移除特定降噪。TransRFIR 基于两个设计的注意力模块,Multi-Head Agent Self-Attention (MHASA) 和 Multi-Head Agent Cross Attention (MHACA)。MHASA 和 MHACA 引入了代理标记和到达线性复杂性,实现了低于普通自注意力和跨注意力的计算成本,并获得竞争力的性能。与其它类似实现相比,我们的 TransRFIR 取得了最先进的性能,并证明了在图像修复领域这是一个有效的架构。我们将该项目发布在 https:// this URL。
https://arxiv.org/abs/2404.10342
Omnidirectional images (ODIs) are commonly used in real-world visual tasks, and high-resolution ODIs help improve the performance of related visual tasks. Most existing super-resolution methods for ODIs use end-to-end learning strategies, resulting in inferior realness of generated images and a lack of effective out-of-domain generalization capabilities in training methods. Image generation methods represented by diffusion model provide strong priors for visual tasks and have been proven to be effectively applied to image restoration tasks. Leveraging the image priors of the Stable Diffusion (SD) model, we achieve omnidirectional image super-resolution with both fidelity and realness, dubbed as OmniSSR. Firstly, we transform the equirectangular projection (ERP) images into tangent projection (TP) images, whose distribution approximates the planar image domain. Then, we use SD to iteratively sample initial high-resolution results. At each denoising iteration, we further correct and update the initial results using the proposed Octadecaplex Tangent Information Interaction (OTII) and Gradient Decomposition (GD) technique to ensure better consistency. Finally, the TP images are transformed back to obtain the final high-resolution results. Our method is zero-shot, requiring no training or fine-tuning. Experiments of our method on two benchmark datasets demonstrate the effectiveness of our proposed method.
定向图像(ODIs)通常用于现实世界的视觉任务,而高分辨率ODIs有助于提高相关视觉任务的性能。大多数现有的超分辨率方法ODIs都使用端到端学习策略,导致生成的图像的现实性较差,训练方法中缺乏有效的跨域通用能力。代表扩散模型的图像生成方法具有很强的对视觉任务的优先级,已经被证明有效地应用于图像修复任务。通过利用Stable Diffusion(SD)模型的图像先验,我们实现了一种既有保真度又有真实感的 omnidirectional 图像超分辨率,被称为OmniSSR。首先,我们将等角投影(ERP)图像转换为切线投影(TP)图像,其分布近似于平面图像域。然后,我们使用SD逐迭代采样初始高分辨率结果。在每一次去噪迭代中,我们进一步使用所提出的八面体切线信息交互(OTII)和梯度分解(GD)技术纠正和更新初始结果,确保更好的一致性。最后,TP图像转换为获得最终高分辨率结果。我们的方法是零散的,不需要训练或微调。在两个基准数据集上的实验表明,我们提出的方法的有效性。
https://arxiv.org/abs/2404.10312
Image restoration, which aims to recover high-quality images from their corrupted counterparts, often faces the challenge of being an ill-posed problem that allows multiple solutions for a single input. However, most deep learning based works simply employ l1 loss to train their network in a deterministic way, resulting in over-smoothed predictions with inferior perceptual quality. In this work, we propose a novel method that shifts the focus from a deterministic pixel-by-pixel comparison to a statistical perspective, emphasizing the learning of distributions rather than individual pixel values. The core idea is to introduce spatial entropy into the loss function to measure the distribution difference between predictions and targets. To make this spatial entropy differentiable, we employ kernel density estimation (KDE) to approximate the probabilities for specific intensity values of each pixel with their neighbor areas. Specifically, we equip the entropy with diffusion models and aim for superior accuracy and enhanced perceptual quality over l1 based noise matching loss. In the experiments, we evaluate the proposed method for low light enhancement on two datasets and the NTIRE challenge 2024. All these results illustrate the effectiveness of our statistic-based entropy loss. Code is available at this https URL.
图像修复的目标是从损坏的图像中恢复高质量的图像,通常面临着一个具有单个输入多项式解的问题。然而,大多数基于深度学习的作品仅仅采用L1损失来以确定性的方式训练网络,导致预测过拟合,感知质量差。在本文中,我们提出了一种新方法,将重点从确定性的像素逐像素比较转变为统计视角,强调学习分布而不是单个像素值。核心思想是引入空间熵到损失函数中,以测量预测和目标之间的分布差异。为了使空间熵不同寻常,我们采用核密度估计(KDE)来近似每个像素具有与其邻居区域的具体强度值的概率。具体来说,我们将熵与扩散模型相结合,旨在实现与基于L1噪声匹配的损失相比的卓越准确性和感知质量的提高。在实验中,我们对所提出的方法在两个数据集上的低光增强进行了评估,以及NTIRE挑战2024。所有这些结果都说明了基于统计熵的熵损失的有效性。代码可在此处访问:https://www.xxx.com/
https://arxiv.org/abs/2404.09735
Though diffusion models have been successfully applied to various image restoration (IR) tasks, their performance is sensitive to the choice of training datasets. Typically, diffusion models trained in specific datasets fail to recover images that have out-of-distribution degradations. To address this problem, this work leverages a capable vision-language model and a synthetic degradation pipeline to learn image restoration in the wild (wild IR). More specifically, all low-quality images are simulated with a synthetic degradation pipeline that contains multiple common degradations such as blur, resize, noise, and JPEG compression. Then we introduce robust training for a degradation-aware CLIP model to extract enriched image content features to assist high-quality image restoration. Our base diffusion model is the image restoration SDE (IR-SDE). Built upon it, we further present a posterior sampling strategy for fast noise-free image generation. We evaluate our model on both synthetic and real-world degradation datasets. Moreover, experiments on the unified image restoration task illustrate that the proposed posterior sampling improves image generation quality for various degradations.
尽管扩散模型已成功应用于各种图像修复(IR)任务,但它们的性能对训练数据的选择非常敏感。通常,在特定数据集上训练的扩散模型无法恢复具有离散退化的图像。为解决这个问题,本文利用了一个强大的视觉语言模型和一个合成退化管道来在野外学习图像修复(野生IR)。具体来说,本文使用包含 blur、resize、noise 和 JPEG 压缩等多种常见退化的合成退化管道来模拟所有低质量图像。然后,我们引入了一种对退化的关注的 CLIP 模型,以提取高质量图像修复所需的丰富图像内容特征。我们的基本扩散模型是图像修复 SDE(IR-SDE)。基于它,我们进一步提出了一种用于快速无噪声图像生成的后验采样策略。我们对模型在合成和真实世界退化数据集上的性能进行了评估。此外,在统一图像修复任务上进行的实验表明,所提出的后验采样策略可以提高各种退化下的图像生成质量。
https://arxiv.org/abs/2404.09732
Human beings construct perception of space by integrating sparse observations into massively interconnected synapses and neurons, offering a superior parallelism and efficiency. Replicating this capability in AI finds wide applications in medical imaging, AR/VR, and embodied AI, where input data is often sparse and computing resources are limited. However, traditional signal reconstruction methods on digital computers face both software and hardware challenges. On the software front, difficulties arise from storage inefficiencies in conventional explicit signal representation. Hardware obstacles include the von Neumann bottleneck, which limits data transfer between the CPU and memory, and the limitations of CMOS circuits in supporting parallel processing. We propose a systematic approach with software-hardware co-optimizations for signal reconstruction from sparse inputs. Software-wise, we employ neural field to implicitly represent signals via neural networks, which is further compressed using low-rank decomposition and structured pruning. Hardware-wise, we design a resistive memory-based computing-in-memory (CIM) platform, featuring a Gaussian Encoder (GE) and an MLP Processing Engine (PE). The GE harnesses the intrinsic stochasticity of resistive memory for efficient input encoding, while the PE achieves precise weight mapping through a Hardware-Aware Quantization (HAQ) circuit. We demonstrate the system's efficacy on a 40nm 256Kb resistive memory-based in-memory computing macro, achieving huge energy efficiency and parallelism improvements without compromising reconstruction quality in tasks like 3D CT sparse reconstruction, novel view synthesis, and novel view synthesis for dynamic scenes. This work advances the AI-driven signal restoration technology and paves the way for future efficient and robust medical AI and 3D vision applications.
人类通过将稀疏观测整合到密集连接的神经元中,构建了我们对空间的感知,这使得人工智能在医学成像、增强现实(AR)和 embodied AI等领域具有卓越的并行度和效率。在AI中实现这种能力面临着软件和硬件方面的挑战。在软件方面,困难源于传统显式信号表示中存储效率低下。硬件方面,包括由冯·诺伊曼瓶颈限制了CPU和内存之间的数据传输,以及CMOS电路在支持并行处理方面的限制。我们提出了一个软件和硬件协同优化的信号重构系统,可以从稀疏输入中恢复信号。在软件方面,我们使用神经场通过神经网络隐含表示信号,并使用低秩分解和结构化剪裁进一步压缩。在硬件方面,我们设计了一个基于电阻性内存的计算在内存(CIM)平台,包括一个高斯编码器(GE)和一个多层感知器(MLP处理引擎(PE)。GE利用电阻性内存的固有随机性实现高效的输入编码,而PE通过硬件感知量化(HAQ)电路实现精确的权重映射。我们在基于40nm的256Kb电阻性内存的内存计算宏观上展示了系统的效果,实现了巨大的能效和并行度改进,而不会牺牲重构质量,例如3D CT稀疏重建、新颖视图合成和动态场景下的新颖视图合成。这项工作推动了AI驱动的信号修复技术的发展,为未来的高效和可靠的医疗AI和3D视觉应用铺平了道路。
https://arxiv.org/abs/2404.09613
In medical imaging, accurate image segmentation is crucial for quantifying diseases, assessing prognosis, and evaluating treatment outcomes. However, existing methods lack an in-depth integration of global and local features, failing to pay special attention to abnormal regions and boundary details in medical images. To this end, we present a novel deep learning-based approach, MIPC-Net, for precise boundary segmentation in medical images. Our approach, inspired by radiologists' working patterns, features two distinct modules: (i) \textbf{Mutual Inclusion of Position and Channel Attention (MIPC) module}: To enhance the precision of boundary segmentation in medical images, we introduce the MIPC module, which enhances the focus on channel information when extracting position features and vice versa; (ii) \textbf{GL-MIPC-Residue}: To improve the restoration of medical images, we propose the GL-MIPC-Residue, a global residual connection that enhances the integration of the encoder and decoder by filtering out invalid information and restoring the most effective information lost during the feature extraction process. We evaluate the performance of the proposed model using metrics such as Dice coefficient (DSC) and Hausdorff Distance (HD) on three publicly accessible datasets: Synapse, ISIC2018-Task, and Segpc. Our ablation study shows that each module contributes to improving the quality of segmentation results. Furthermore, with the assistance of both modules, our approach outperforms state-of-the-art methods across all metrics on the benchmark datasets, notably achieving a 2.23mm reduction in HD on the Synapse dataset, strongly evidencing our model's enhanced capability for precise image boundary segmentation. Codes will be available at this https URL.
在医学影像学中,准确的图像分割对于评估疾病、预测预后和评估治疗效果至关重要。然而,现有的方法缺乏对全局和局部特征的深入整合,未能特别关注医学图像中的异常区域和边界细节。为此,我们提出了一个基于深度学习的新的精确边界分割方法,称为MIPC-Net。 我们的方法受到放射科医生工作模式的启发,包括两个不同的模块:(i) mutual inclusion of position and channel attention (MIPC)模块:为了提高医学图像边界分割的精度,我们引入了MIPC模块,在提取位置特征时增强了通道信息的关注;反之亦然;(ii) GL-MIPC-Residue:为了提高医学图像的恢复效果,我们提出了GL-MIPC-Residue,一个全局残留连接,通过滤除无效信息并恢复在特征提取过程中丢失的有效信息来增强编码器和解码器的整合。 我们使用Synapse、ISIC2018-Task和Segpc三个公开可用的数据集对所提出的模型进行评估。我们的消融研究结果表明,每个模块都促进了分割结果的质量提升。此外,在基准数据集上,通过两个模块的辅助,我们的方法在所有指标上都超越了最先进的方法,特别是在Synapse数据集上实现了2.23mm的HD减少,充分证明了我们的模型在精确图像边界分割方面的增强能力。代码将在此链接处可用。
https://arxiv.org/abs/2404.08201
Imaging through scattering media is a fundamental and pervasive challenge in fields ranging from medical diagnostics to astronomy. A promising strategy to overcome this challenge is wavefront modulation, which induces measurement diversity during image acquisition. Despite its importance, designing optimal wavefront modulations to image through scattering remains under-explored. This paper introduces a novel learning-based framework to address the gap. Our approach jointly optimizes wavefront modulations and a computationally lightweight feedforward "proxy" reconstruction network. This network is trained to recover scenes obscured by scattering, using measurements that are modified by these modulations. The learned modulations produced by our framework generalize effectively to unseen scattering scenarios and exhibit remarkable versatility. During deployment, the learned modulations can be decoupled from the proxy network to augment other more computationally expensive restoration algorithms. Through extensive experiments, we demonstrate our approach significantly advances the state of the art in imaging through scattering media. Our project webpage is at this https URL.
通过散射媒体进行成像是一个基本且普遍的挑战,涉及医学诊断、天文学等各个领域。克服这一挑战的一种有前景的方法是波前调制,它在图像采集过程中诱导测量多样性。尽管波前调制非常重要,但为通过散射成像设计最优波前调制仍然是一个未被探索的问题。本文介绍了一种基于学习的新框架来解决这一空白。我们的方法共同优化波前调制和一种计算轻量级的反馈“代理”重构网络。该网络通过这些调制对测量进行训练,以恢复由这些调制导致的场景。我们框架产生的学习调制具有很好的泛化效果,能够有效地扩展到未见过的散射场景,并表现出出色的 versatility。在部署过程中,学习到的调制可以与代理网络分离,从而增强其他更计算密集的恢复算法。通过大量实验,我们证明,我们的方法在散射媒体成像方面显著提高了现有水平。我们的项目网页地址是https://www.example.com。
https://arxiv.org/abs/2404.07985
Blind-spot networks (BSN) have been prevalent network architectures in self-supervised image denoising (SSID). Existing BSNs are mostly conducted with convolution layers. Although transformers offer potential solutions to the limitations of convolutions and have demonstrated success in various image restoration tasks, their attention mechanisms may violate the blind-spot requirement, thus restricting their applicability in SSID. In this paper, we present a transformer-based blind-spot network (TBSN) by analyzing and redesigning the transformer operators that meet the blind-spot requirement. Specifically, TBSN follows the architectural principles of dilated BSNs, and incorporates spatial as well as channel self-attention layers to enhance the network capability. For spatial self-attention, an elaborate mask is applied to the attention matrix to restrict its receptive field, thus mimicking the dilated convolution. For channel self-attention, we observe that it may leak the blind-spot information when the channel number is greater than spatial size in the deep layers of multi-scale architectures. To eliminate this effect, we divide the channel into several groups and perform channel attention separately. Furthermore, we introduce a knowledge distillation strategy that distills TBSN into smaller denoisers to improve computational efficiency while maintaining performance. Extensive experiments on real-world image denoising datasets show that TBSN largely extends the receptive field and exhibits favorable performance against state-of-the-art SSID methods. The code and pre-trained models will be publicly available at this https URL.
盲点网络(BSN)是自监督图像去噪(SSID)中普遍存在的网络架构。现有的BSN主要使用卷积层。尽管Transformer为卷积的局限性提供了潜在解决方案,并在各种图像修复任务中取得了成功,但它们的注意力机制可能违反盲点要求,从而限制了其在SSID中的应用。在本文中,我们通过分析和解构满足盲点要求的Transformer操作器,提出了一种基于Transformer的盲点网络(TBSN)。具体来说,TBSN遵循扩散BSN的架构原则,并引入了空间和通道自注意层以增强网络能力。对于空间自注意,我们为注意力矩阵应用了一个详细的掩码,以限制其接收场,从而模仿扩散卷积。对于通道自注意,我们观察到,当通道数量大于在多尺度架构的深层中空间大小时,它可能泄漏盲点信息。为了消除这种效果,我们将通道分为几组,并进行通道自注意。此外,我们引入了一种知识蒸馏策略,将TBSN分解为较小的去噪器以提高计算效率,同时保持性能。在现实世界的图像去噪数据集上进行广泛的实验证明,TBSN大大扩展了接收场,并对最先进的SSID方法表现出优越的性能。代码和预训练模型将公开发布在https://这个URL上。
https://arxiv.org/abs/2404.07846
Image restoration is rather challenging in adverse weather conditions, especially when multiple degradations occur simultaneously. Blind image decomposition was proposed to tackle this issue, however, its effectiveness heavily relies on the accurate estimation of each component. Although diffusion-based models exhibit strong generative abilities in image restoration tasks, they may generate irrelevant contents when the degraded images are severely corrupted. To address these issues, we leverage physical constraints to guide the whole restoration process, where a mixed degradation model based on atmosphere scattering model is constructed. Then we formulate our Joint Conditional Diffusion Model (JCDM) by incorporating the degraded image and degradation mask to provide precise guidance. To achieve better color and detail recovery results, we further integrate a refinement network to reconstruct the restored image, where Uncertainty Estimation Block (UEB) is employed to enhance the features. Extensive experiments performed on both multi-weather and weather-specific datasets demonstrate the superiority of our method over state-of-the-art competing methods.
图像修复在恶劣天气条件下相当具有挑战性,尤其是在同时发生多个降级的情况下。为了应对这个问题,我们提出了盲图像分解来解决这个问题,然而,它的有效性在很大程度上取决于对每个组件的准确估计。虽然基于扩散的模型在图像修复任务中表现出强大的生成能力,但当严重损坏的降级图像被破坏时,它们可能会生成无关内容。为了应对这些问题,我们利用物理约束来引导整个修复过程,其中基于大气散射模型的混合降级模型被构建。然后,我们通过将降级图像和降级掩码集成来构建我们的联合条件扩散模型(JCDM),以提供精确的指导。为了实现更好的色彩和细节恢复结果,我们进一步引入了一个修复网络来重构修复后的图像,其中使用了不确定性估计块(UEB)来增强特征。在多天气和特定天气数据集上进行的大量实验证明,我们的方法相对于最先进的竞争方法具有优越性。
https://arxiv.org/abs/2404.07770
Deep subspace clustering methods are now prominent in clustering, typically using fully connected networks and a self-representation loss function. However, these methods often struggle with overfitting and lack interpretability. In this paper, we explore an alternative clustering approach based on deep unfolding. By unfolding iterative optimization methods into neural networks, this approach offers enhanced interpretability and reliability compared to data-driven deep learning methods, and greater adaptability and generalization than model-based approaches. Hence, unfolding has become widely used in inverse imaging problems, such as image restoration, reconstruction, and super-resolution, but has not been sufficiently explored yet in the context of clustering. In this work, we introduce an innovative clustering architecture for hyperspectral images (HSI) by unfolding an iterative solver based on the Alternating Direction Method of Multipliers (ADMM) for sparse subspace clustering. To our knowledge, this is the first attempt to apply unfolding ADMM for computing the self-representation matrix in subspace clustering. Moreover, our approach captures well the structural characteristics of HSI data by employing the K nearest neighbors algorithm as part of a structure preservation module. Experimental evaluation of three established HSI datasets shows clearly the potential of the unfolding approach in HSI clustering and even demonstrates superior performance compared to state-of-the-art techniques.
深度子空间聚类方法如今在聚类中占据了突出地位,通常使用全连接网络和自表示损失函数。然而,这些方法往往容易陷入过拟合,缺乏可解释性。在本文中,我们探讨了一种基于深度展开的聚类方法。通过将迭代优化方法展开成神经网络,这种方法在数据驱动的深度学习方法和基于模型的方法中提供了更高的可解释性和可靠性,以及比模型方法更强的适应性和泛化能力。因此,展开已经成为在反向成像问题中广泛使用的技术,如图像修复、复原和超分辨率。然而,在聚类领域中,展开还没有得到足够的探索。在这项工作中,我们引入了一种创新的聚类架构,基于展开自乘子方法(ADMM)的稀疏子空间聚类。据我们所知,这是在反向成像数据中应用展开ADMM计算自表示矩阵的第一次尝试。此外,通过使用K近邻算法作为结构保留模块,我们的方法很好地捕捉了HSI数据的结构特征。对三个现有的HSI数据集的实验评估表明,展开方法在HSI聚类中具有巨大的潜力,甚至比最先进的技巧还要好。
https://arxiv.org/abs/2404.07112
Many technical solutions are bio-inspired. Octopus-inspired robotic arms belong to continuum robots which are used in minimally invasive surgery or for technical system restoration in areas difficult-toaccess. Continuum robot missions are bounded with their motions, whereby the motion of the robots is controlled by humans via wireless communication. In case of a lost connection, robot autonomy is required. Distributed control and distributed decision-making mechanisms based on artificial intelligence approaches can be a promising solution to achieve autonomy of technical systems and to increase their resilience. However these methods are not well investigated yet. Octopuses are the living example of natural distributed intelligence but their learning and decision-making mechanisms are also not fully investigated and understood yet. Our major interest is investigating mechanisms of Distributed Artificial Intelligence as a basis for improving resilience of complex systems. We decided to use a physical continuum robot prototype that is able to perform some basic movements for our research. The idea is to research how a technical system can be empowered to combine movements into sequences of motions by itself. For the experimental investigations a suitable physical prototype has to be selected, its motion control has to be implemented and automated. In this paper, we give an overview combining different fields of research, such as Distributed Artificial Intelligence and continuum robots based on 98 publications. We provide a detailed description of the basic motion control models of continuum robots based on the literature reviewed, discuss different aspects of autonomy and give an overview of physical prototypes of continuum robots.
许多技术解决方案都是生物启发的。章鱼启发的机器人手臂属于连续机器人,用于微创手术或难以接近区域的计算机系统恢复。连续机器人任务由其运动边界,因此通过无线通信,机器人运动由人类控制。在丢失连接的情况下,需要机器人自治。基于人工智能的方法可以实现技术系统的自治,并提高其弹性。然而,这些方法尚未得到充分研究。章鱼是自然分布式智力的生物例子,但它们的学习和决策机制尚未完全调查和理解。我们的主要兴趣是研究分布式人工智能机制,作为提高复杂系统弹性的基础。我们决定使用一个能够执行基本运动动作的物理连续机器人原型进行研究。目标是研究技术系统如何通过自身组合运动为一系列动作。为了进行实验研究,需要选择一个合适的物理原型,并实现其运动控制和自动化。在本文中,我们综述了不同领域的研究,包括分布式人工智能和基于98篇论文的连续机器人。我们详细描述了基于文献回顾的连续机器人基本运动控制模型,讨论了自主的不同方面,并概述了连续机器人的物理原型。
https://arxiv.org/abs/2404.06171
Low-light image enhancement (LLIE) aims to improve low-illumination images. However, existing methods face two challenges: (1) uncertainty in restoration from diverse brightness degradations; (2) loss of texture and color information caused by noise suppression and light enhancement. In this paper, we propose a novel enhancement approach, CodeEnhance, by leveraging quantized priors and image refinement to address these challenges. In particular, we reframe LLIE as learning an image-to-code mapping from low-light images to discrete codebook, which has been learned from high-quality images. To enhance this process, a Semantic Embedding Module (SEM) is introduced to integrate semantic information with low-level features, and a Codebook Shift (CS) mechanism, designed to adapt the pre-learned codebook to better suit the distinct characteristics of our low-light dataset. Additionally, we present an Interactive Feature Transformation (IFT) module to refine texture and color information during image reconstruction, allowing for interactive enhancement based on user preferences. Extensive experiments on both real-world and synthetic benchmarks demonstrate that the incorporation of prior knowledge and controllable information transfer significantly enhances LLIE performance in terms of quality and fidelity. The proposed CodeEnhance exhibits superior robustness to various degradations, including uneven illumination, noise, and color distortion.
低光图像增强(LLIE)旨在改善低光图像。然而,现有的方法面临两个挑战:(1)从不同亮度退化中恢复修复的不确定性;(2)由于噪声抑制和光增强而丢失纹理和颜色信息。在本文中,我们提出了一种新增强方法,称为CodeEnhance,通过利用量化先验信息和图像修复来解决这些挑战。特别地,我们将LLIE重新表述为从低光图像中学习图像到编码映射,这是从高质量图像中学习的高质量图像。为了增强这个过程,我们引入了一个语义嵌入模块(SEM),以将语义信息与低级特征集成,并设计了一个Codebook Shift(CS)机制,旨在将预先学习的编码器适应该低光数据集的显著特征。此外,我们还介绍了交互式特征转换(IFT)模块,用于在图像重建过程中修复纹理和颜色信息,并允许根据用户偏好进行交互式增强。在现实世界和合成基准上进行的大量实验证明,引入先验知识和可控制信息传递 significantly增强了LLIE在质量和保真度方面的性能。所提出的CodeEnhance在各种退化中表现出卓越的鲁棒性,包括不均匀光照、噪声和颜色失真。
https://arxiv.org/abs/2404.05253
The electrocardiogram (ECG) is an essential tool for diagnosing heart disease, with computer-aided systems improving diagnostic accuracy and reducing healthcare costs. Despite advancements, existing systems often miss rare cardiac anomalies that could be precursors to serious, life-threatening issues or alterations in the cardiac macro/microstructure. We address this gap by focusing on self-supervised anomaly detection (AD), training exclusively on normal ECGs to recognize deviations indicating anomalies. We introduce a novel self-supervised learning framework for ECG AD, utilizing a vast dataset of normal ECGs to autonomously detect and localize cardiac anomalies. It proposes a novel masking and restoration technique alongside a multi-scale cross-attention module, enhancing the model's ability to integrate global and local signal features. The framework emphasizes accurate localization of anomalies within ECG signals, ensuring the method's clinical relevance and reliability. To reduce the impact of individual variability, the approach further incorporates crucial patient-specific information from ECG reports, such as age and gender, thus enabling accurate identification of a broad spectrum of cardiac anomalies, including rare ones. Utilizing an extensive dataset of 478,803 ECG graphic reports from real-world clinical practice, our method has demonstrated exceptional effectiveness in AD across all tested conditions, regardless of their frequency of occurrence, significantly outperforming existing models. It achieved superior performance metrics, including an AUROC of 91.2%, an F1 score of 83.7%, a sensitivity rate of 84.2%, a specificity of 83.0%, and a precision of 75.6% with a fixed recall rate of 90%. It has also demonstrated robust localization capabilities, with an AUROC of 76.5% and a Dice coefficient of 65.3% for anomaly localization.
电图(ECG)是对心脏病诊断的重要工具,计算机辅助系统可以提高诊断准确性并降低医疗费用。尽管取得了进步,现有的系统通常会错过可能是严重心脏问题或心肌显微结构改变的前兆的罕见心电异常。我们通过专注于自监督异常检测(AD),仅使用正常ECG进行训练,以识别表示异常的偏差,来填补这一空白。我们引入了一种新颖的自监督学习框架用于ECG AD,利用大量正常ECG数据集来自动检测和定位心电异常。它结合了一种新颖的遮盖和恢复技术以及多尺度注意力模块,提高了模型整合全局和局部信号特征的能力。该框架强调在ECG信号中准确地定位异常,确保该方法具有临床相关性和可靠性。为了减少个体差异的影响,该方法还进一步引入了从心电报告中的关键患者特定信息,例如年龄和性别,从而准确地识别出包括罕见在内的心电异常广泛的范围。利用来自真实临床实践的大型数据集,我们的方法在所有测试条件下对AD表现出卓越的效果,无论它们的发生频率如何,显著优于现有模型。它取得了卓越的性能指标,包括AUROC的91.2%,F1分数的83.7%,敏感率84.2%,特异性83.0%和精度75.6%,固定召回率90%。它还展示了稳健的定位能力,包括AUROC的76.5%和Dice系数65.3%的异常定位。
https://arxiv.org/abs/2404.04935
The key success of existing video super-resolution (VSR) methods stems mainly from exploring spatial and temporal information, which is usually achieved by a recurrent propagation module with an alignment module. However, inaccurate alignment usually leads to aligned features with significant artifacts, which will be accumulated during propagation and thus affect video restoration. Moreover, propagation modules only propagate the same timestep features forward or backward that may fail in case of complex motion or occlusion, limiting their performance for high-quality frame restoration. To address these issues, we propose a collaborative feedback discriminative (CFD) method to correct inaccurate aligned features and model long -range spatial and temporal information for better video reconstruction. In detail, we develop a discriminative alignment correction (DAC) method to adaptively explore information and reduce the influences of the artifacts caused by inaccurate alignment. Then, we propose a collaborative feedback propagation (CFP) module that employs feedback and gating mechanisms to better explore spatial and temporal information of different timestep features from forward and backward propagation simultaneously. Finally, we embed the proposed DAC and CFP into commonly used VSR networks to verify the effectiveness of our method. Quantitative and qualitative experiments on several benchmarks demonstrate that our method can improve the performance of existing VSR models while maintaining a lower model complexity. The source code and pre-trained models will be available at \url{this https URL}.
现有视频超分辨率(VSR)方法的的成功主要源于探索空间和时间信息,这通常通过具有对齐模块的循环传播模块实现。然而,不准确的对齐通常会导致传播过程中累积 significant artifacts,从而影响视频修复。此外,传播模块仅传播相同的时间步特征向前或向后,这可能导致在复杂运动或遮挡的情况下失败,从而限制了高质量帧修复的性能。为了应对这些问题,我们提出了一个合作反馈差异分类(CFD)方法来纠正不准确的对齐特征并建模长距离空间和时间信息以实现更好的视频修复。具体来说,我们开发了一种差异对齐校正(DAC)方法,以适应性地探索信息并减少由不准确对齐引起的 artifacts。然后,我们提出了一个合作反馈传播(CFP)模块,它采用反馈和开关机制同时更好地探索不同时间步特征的前向和后向传播。最后,我们将所提出的DAC和CFP嵌入到常用的VSR网络中,以验证我们方法的 effectiveness。在几个基准测试上进行定量和定性实验,我们的方法可以提高现有VSR模型的性能,同时保持较低的模型复杂性。源代码和预训练模型将公开在 \url{这个链接} 中。
https://arxiv.org/abs/2404.04745
We propose Diverse Restormer (DART), a novel image restoration method that effectively integrates information from various sources (long sequences, local and global regions, feature dimensions, and positional dimensions) to address restoration challenges. While Transformer models have demonstrated excellent performance in image restoration due to their self-attention mechanism, they face limitations in complex scenarios. Leveraging recent advancements in Transformers and various attention mechanisms, our method utilizes customized attention mechanisms to enhance overall performance. DART, our novel network architecture, employs windowed attention to mimic the selective focusing mechanism of human eyes. By dynamically adjusting receptive fields, it optimally captures the fundamental features crucial for image resolution reconstruction. Efficiency and performance balance are achieved through the LongIR attention mechanism for long sequence image restoration. Integration of attention mechanisms across feature and positional dimensions further enhances the recovery of fine details. Evaluation across five restoration tasks consistently positions DART at the forefront. Upon acceptance, we commit to providing publicly accessible code and models to ensure reproducibility and facilitate further research.
我们提出了Diverse Restormer(DART),一种新颖的图像修复方法,它有效地整合了来自各种来源(长序列、局部和全局区域、特征维度和位置维度)的信息来解决修复挑战。虽然Transformer模型因自注意力机制在图像修复方面表现优异,但在复杂场景中存在局限性。利用Transformer模型和各种关注机制的最近进展,我们的方法利用自定义的关注机制来提高整体性能。DART,我们的新网络架构,采用窗口注意来模拟人眼的 selective 聚焦机制。通过动态调整接收视野,它最优化地捕捉了图像分辨率重建所需的根本特征。通过LongIR关注机制实现的长序列图像修复具有高效性和性能平衡。在特征和位置维度上跨注意机制的集成进一步增强了细节的恢复。在五个修复任务上进行评估时,DART始终位于最前沿。在接受通过后,我们承诺提供可公开访问的代码和模型,以确保可重复性和促进进一步研究。
https://arxiv.org/abs/2404.04617
Maintaining a robust communication network plays an important role in the success of a multi-robot team jointly performing an optimization task. A key characteristic of a robust cooperative multi-robot system is the ability to repair the communication topology in the case of robot failure. In this paper, we focus on the Fast k-connectivity Restoration (FCR) problem, which aims to repair a network to make it k-connected with minimum robot movement. We develop a Quadratically Constrained Program (QCP) formulation of the FCR problem, which provides a way to optimally solve the problem, but cannot handle large instances due to high computational overhead. We therefore present a scalable algorithm, called EA-SCR, for the FCR problem using graph theoretic concepts. By conducting empirical studies, we demonstrate that the EA-SCR algorithm performs within 10 percent of the optimal while being orders of magnitude faster. We also show that EA-SCR outperforms existing solutions by 30 percent in terms of the FCR distance metric.
保持一个健壮的通信网络在多机器人团队共同执行优化任务的成功中发挥着重要作用。一个健壮的协作多机器人系统的关键特征是能够在机器人故障的情况下修复通信拓扑。在本文中,我们关注快速k连接恢复(FCR)问题,该问题的目标是将网络修复到具有最小机器人运动的状态下使其具有k个连接。我们开发了一个四元约束规划(QCP)形式的FCR问题,提供了优化问题的解决方法,但由于计算开销较高,无法处理大型实例。因此,我们提出了一个基于图论概念的扩展算法,称为EA-SCR,用于解决FCR问题。通过进行实证研究,我们证明了EA-SCR算法在保持最优性能的同时,速度比最优解快10倍以上。我们还证明了EA-SCR在FCR距离度量上比现有解决方案快30%。
https://arxiv.org/abs/2404.03834
NeRF (Neural Radiance Fields) has demonstrated tremendous potential in novel view synthesis and 3D reconstruction, but its performance is sensitive to input image quality, which struggles to achieve high-fidelity rendering when provided with low-quality sparse input viewpoints. Previous methods for NeRF restoration are tailored for specific degradation type, ignoring the generality of restoration. To overcome this limitation, we propose a generic radiance fields restoration pipeline, named RaFE, which applies to various types of degradations, such as low resolution, blurriness, noise, compression artifacts, or their combinations. Our approach leverages the success of off-the-shelf 2D restoration methods to recover the multi-view images individually. Instead of reconstructing a blurred NeRF by averaging inconsistencies, we introduce a novel approach using Generative Adversarial Networks (GANs) for NeRF generation to better accommodate the geometric and appearance inconsistencies present in the multi-view images. Specifically, we adopt a two-level tri-plane architecture, where the coarse level remains fixed to represent the low-quality NeRF, and a fine-level residual tri-plane to be added to the coarse level is modeled as a distribution with GAN to capture potential variations in restoration. We validate RaFE on both synthetic and real cases for various restoration tasks, demonstrating superior performance in both quantitative and qualitative evaluations, surpassing other 3D restoration methods specific to single task. Please see our project website this https URL.
NeRF(神经辐射场)在 novel 视图合成和 3D 重建方面具有巨大的潜力,但它的性能对输入图像质量敏感,在低质量稀疏输入视点下,很难实现高保真度的渲染。以前用于 NeRF 修复的方法都是针对特定降解类型的,忽略了修复的普遍性。为了克服这一局限,我们提出了一个通用的辐射场修复管道,名为 RaFE,它适用于各种类型的降解,如低分辨率、模糊、噪声、压缩伪影或它们的组合。我们的方法利用了离线 2D 修复方法的的成功经验来逐个恢复多视图图像。 而不是通过平均不一致性来重建模糊的 NeRF,我们引入了一种新方法,使用生成对抗网络(GANs)生成 NeRF,以更好地适应多视图图像中存在的几何和外观不一致性。具体来说,我们采用双层三平面架构,其中粗层次级保持不变,以表示低质量的 NeRF,而细层次级添加的残差三平面被视为 GAN 分布,以捕捉修复中的潜在变化。我们在各种修复任务上验证 RaFE,并证明其在数量和质量评估中的卓越性能,超过了针对单任务的其他 3D 修复方法。请查看我们的项目网站,此链接为 https://www.google.com/
https://arxiv.org/abs/2404.03654