In this work, we address two limitations of existing conditional diffusion models: their slow inference speed due to the iterative denoising process and their reliance on paired data for model fine-tuning. To tackle these issues, we introduce a general method for adapting a single-step diffusion model to new tasks and domains through adversarial learning objectives. Specifically, we consolidate various modules of the vanilla latent diffusion model into a single end-to-end generator network with small trainable weights, enhancing its ability to preserve the input image structure while reducing overfitting. We demonstrate that, for unpaired settings, our model CycleGAN-Turbo outperforms existing GAN-based and diffusion-based methods for various scene translation tasks, such as day-to-night conversion and adding/removing weather effects like fog, snow, and rain. We extend our method to paired settings, where our model pix2pix-Turbo is on par with recent works like Control-Net for Sketch2Photo and Edge2Image, but with a single-step inference. This work suggests that single-step diffusion models can serve as strong backbones for a range of GAN learning objectives. Our code and models are available at this https URL.
在这项工作中,我们解决了现有条件扩散模型的两个局限:由于迭代去噪过程导致其推理速度较慢,以及它们依赖于成对数据进行模型微调。为了应对这些问题,我们引入了一种通过对抗学习目标将单步扩散模型适应新任务和领域的通用方法。具体来说,我们将各种模块整合到一个具有小训练权重的单端到端生成器网络中,提高其保留输入图像结构的能力,同时减少过拟合。我们证明了,对于未配对设置,我们的模型CycleGAN-Turbo在各种场景平移任务中优于现有的基于GAN和扩散的方法,如日夜转换和添加/删除天气效果(如雾、雪和雨)。我们将我们的方法扩展到配对设置,其中我们的模型pix2pix-Turbo与近期的类似工作 Control-Net for Sketch2Photo和Edge2Image相当,但只有一个步骤的推理。这项工作表明,单步扩散模型可以作为各种GAN学习目标的强大骨架。我们的代码和模型可以从该https URL获取。
https://arxiv.org/abs/2403.12036
Generating multi-view images based on text or single-image prompts is a critical capability for the creation of 3D content. Two fundamental questions on this topic are what data we use for training and how to ensure multi-view consistency. This paper introduces a novel framework that makes fundamental contributions to both questions. Unlike leveraging images from 2D diffusion models for training, we propose a dense consistent multi-view generation model that is fine-tuned from off-the-shelf video generative models. Images from video generative models are more suitable for multi-view generation because the underlying network architecture that generates them employs a temporal module to enforce frame consistency. Moreover, the video data sets used to train these models are abundant and diverse, leading to a reduced train-finetuning domain gap. To enhance multi-view consistency, we introduce a 3D-Aware Denoising Sampling, which first employs a feed-forward reconstruction module to get an explicit global 3D model, and then adopts a sampling strategy that effectively involves images rendered from the global 3D model into the denoising sampling loop to improve the multi-view consistency of the final images. As a by-product, this module also provides a fast way to create 3D assets represented by 3D Gaussians within a few seconds. Our approach can generate 24 dense views and converges much faster in training than state-of-the-art approaches (4 GPU hours versus many thousand GPU hours) with comparable visual quality and consistency. By further fine-tuning, our approach outperforms existing state-of-the-art methods in both quantitative metrics and visual effects. Our project page is this http URL.
基于文本或单张图像提示生成多视角图像是一个关键的三维内容创作能力。关于这个问题,有两个基本问题是我们用于训练的数据是什么,以及如何确保多视角的一致性。本文介绍了一种对两个问题做出根本性贡献的新框架。与从2D扩散模型中利用图像进行训练不同,我们提出了一个密集一致的多视角生成模型,该模型从标准的视频生成模型进行微调。由于这些模型使用的视频数据集丰富多样,导致训练领域差异减小。为了增强多视角一致性,我们引入了3D感知去噪采样,它首先采用一种前馈重构模块获得一个明确的全身3D模型,然后采用一种有效的方法将全局3D模型的图像抽样入去噪采样循环中,以提高最终图像的多视角一致性。作为附加功能,这个模块还提供了在几秒钟内创建3D资产表示为3D高斯分布的方式。我们的方法可以生成24个密集视角,并且在训练方面的效率远高于(与同质量、同一致性的最先进方法相比)采用4个GPU小时。通过进一步微调,我们的方法在数量指标和视觉效果方面都超过了现有的最先进方法。我们的项目页面是这个链接:http://www.example.com。
https://arxiv.org/abs/2403.12010
We present a certified defense to clean-label poisoning attacks. These attacks work by injecting a small number of poisoning samples (e.g., 1%) that contain $p$-norm bounded adversarial perturbations into the training data to induce a targeted misclassification of a test-time input. Inspired by the adversarial robustness achieved by $denoised$ $smoothing$, we show how an off-the-shelf diffusion model can sanitize the tampered training data. We extensively test our defense against seven clean-label poisoning attacks and reduce their attack success to 0-16% with only a negligible drop in the test time accuracy. We compare our defense with existing countermeasures against clean-label poisoning, showing that the defense reduces the attack success the most and offers the best model utility. Our results highlight the need for future work on developing stronger clean-label attacks and using our certified yet practical defense as a strong baseline to evaluate these attacks.
我们提出了一个 certified 防御来应对干净标签中毒攻击。这种攻击通过向训练数据中注入一小部分包含 $p$- 范数边界对抗扰动的污染样本(例如,1%),诱导了测试时间输入的目标错误分类。受到 denoised 平滑器获得的对抗鲁棒性的启发,我们证明了可以使用标准的扩散模型净化被篡改的训练数据。我们详细测试了我们的防御对抗七种干净标签中毒攻击,并将攻击成功率降低到只有微小的测试时间准确率下降。我们比较了我们的防御与干净标签中毒防御现有措施,表明防御削弱了攻击的成功率,并为这些攻击提供了最佳模型效用。我们的结果突出了未来工作中开发更强的干净标签攻击以及将我们的经过认证且实用的防御作为这些攻击的强大基线进行评估的必要性。
https://arxiv.org/abs/2403.11981
In this work we present denoiSplit, a method to tackle a new analysis task, i.e. the challenge of joint semantic image splitting and unsupervised denoising. This dual approach has important applications in fluorescence microscopy, where semantic image splitting has important applications but noise does generally hinder the downstream analysis of image content. Image splitting involves dissecting an image into its distinguishable semantic structures. We show that the current state-of-the-art method for this task struggles in the presence of image noise, inadvertently also distributing the noise across the predicted outputs. The method we present here can deal with image noise by integrating an unsupervised denoising sub-task. This integration results in improved semantic image unmixing, even in the presence of notable and realistic levels of imaging noise. A key innovation in denoiSplit is the use of specifically formulated noise models and the suitable adjustment of KL-divergence loss for the high-dimensional hierarchical latent space we are training. We showcase the performance of denoiSplit across 4 tasks on real-world microscopy images. Additionally, we perform qualitative and quantitative evaluations and compare results to existing benchmarks, demonstrating the effectiveness of using denoiSplit: a single Variational Splitting Encoder-Decoder (VSE) Network using two suitable noise models to jointly perform semantic splitting and denoising.
在这项工作中,我们提出了denoiSplit,一种解决新的分析任务的方法,即联合语义图像分割和无监督去噪的挑战。这种双方法在荧光显微镜中具有重要的应用价值,因为语义图像分割在荧光显微镜中具有重要的应用,但噪声通常会阻碍下游图像内容的分析。图像分割涉及将图像分解为其可区分语义结构。我们证明了,目前针对这个任务的最好方法在图像噪声存在的情况下表现不佳,无意中还将噪声分布到预测输出中。我们提出的这个方法可以通过集成一个无监督去噪子任务来处理图像噪声。这个集成使得在存在明显和现实感的图像噪声的情况下,语义图像混合效果得到改善。denoiSplit的关键创新在于使用专门形式的噪声模型和针对我们正在训练的高维层次潜在空间调整KL散度损失。我们在真实世界显微镜图像上展示了denoiSplit的性能。此外,我们进行了定性和定量的评估,并将结果与现有基准进行比较,证明了使用denoiSplit的有效性:使用两个适当的噪声模型共同进行语义分割和去噪。
https://arxiv.org/abs/2403.11854
In clinical examinations and diagnoses, low-dose computed tomography (LDCT) is crucial for minimizing health risks compared with normal-dose computed tomography (NDCT). However, reducing the radiation dose compromises the signal-to-noise ratio, leading to degraded quality of CT images. To address this, we analyze LDCT denoising task based on experimental results from the frequency perspective, and then introduce a novel self-supervised CT image denoising method called WIA-LD2ND, only using NDCT data. The proposed WIA-LD2ND comprises two modules: Wavelet-based Image Alignment (WIA) and Frequency-Aware Multi-scale Loss (FAM). First, WIA is introduced to align NDCT with LDCT by mainly adding noise to the high-frequency components, which is the main difference between LDCT and NDCT. Second, to better capture high-frequency components and detailed information, Frequency-Aware Multi-scale Loss (FAM) is proposed by effectively utilizing multi-scale feature space. Extensive experiments on two public LDCT denoising datasets demonstrate that our WIA-LD2ND, only uses NDCT, outperforms existing several state-of-the-art weakly-supervised and self-supervised methods.
在临床检查和诊断中,低剂量计算机断层扫描(LDCT)对于降低与正常剂量计算机断层扫描(NDCT)的健康风险至关重要。然而,降低辐射剂量会降低信噪比,导致图像质量下降。为了应对这个问题,我们根据频率角度分析LDCT去噪任务,然后引入了一种名为WIA-LD2ND的新自监督CT图像去噪方法,仅使用NDCT数据。所提出的WIA-LD2ND包括两个模块:基于波浪的图像对齐(WIA)和频率感知多尺度损失(FAM)。首先,通过主要在高频部分添加噪声,将NDCT与LDCT对齐,这是LDCT和NDCT之间的主要区别。其次,为了更好地捕捉高频成分和高频信息,频率感知多尺度损失(FAM)被提出,通过有效利用多尺度特征空间。在两个公开LDCT去噪数据集上进行的大量实验证明,我们的WIA-LD2ND,仅使用NDCT,超越了现有几种弱监督和自监督方法的性能。
https://arxiv.org/abs/2403.11672
The high performance of denoising diffusion models for image generation has paved the way for their application in unsupervised medical anomaly detection. As diffusion-based methods require a lot of GPU memory and have long sampling times, we present a novel and fast unsupervised anomaly detection approach based on latent Bernoulli diffusion models. We first apply an autoencoder to compress the input images into a binary latent representation. Next, a diffusion model that follows a Bernoulli noise schedule is employed to this latent space and trained to restore binary latent representations from perturbed ones. The binary nature of this diffusion model allows us to identify entries in the latent space that have a high probability of flipping their binary code during the denoising process, which indicates out-of-distribution data. We propose a masking algorithm based on these probabilities, which improves the anomaly detection scores. We achieve state-of-the-art performance compared to other diffusion-based unsupervised anomaly detection algorithms while significantly reducing sampling time and memory consumption. The code is available at this https URL.
去噪扩散模型的卓越性能为无监督医学异常检测应用铺平了道路。由于扩散方法需要大量的GPU内存并且具有较长的采样时间,我们提出了一个基于潜在伯努利扩散模型的全新快速无监督异常检测方法。我们首先将输入图像应用自编码器压缩成二进制latent表示。接下来,遵循伯努利噪声周期的扩散模型被用于此latent空间,并训练以从扰动后的恢复二进制latent表示。这种扩散模型的二进制特性使我们能够识别在去噪过程中有很大概率翻转其二进制代码的entry,表明离群数据。我们基于这些概率提出了一种掩码算法,从而提高了异常检测得分。我们将在这个https URL上实现代码。
https://arxiv.org/abs/2403.11667
Image restoration is a critical task in low-level computer vision, aiming to restore high-quality images from degraded inputs. Various models, such as convolutional neural networks (CNNs), generative adversarial networks (GANs), transformers, and diffusion models (DMs), have been employed to address this problem with significant impact. However, CNNs have limitations in capturing long-range dependencies. DMs require large prior models and computationally intensive denoising steps. Transformers have powerful modeling capabilities but face challenges due to quadratic complexity with input image size. To address these challenges, we propose VmambaIR, which introduces State Space Models (SSMs) with linear complexity into comprehensive image restoration tasks. We utilize a Unet architecture to stack our proposed Omni Selective Scan (OSS) blocks, consisting of an OSS module and an Efficient Feed-Forward Network (EFFN). Our proposed omni selective scan mechanism overcomes the unidirectional modeling limitation of SSMs by efficiently modeling image information flows in all six directions. Furthermore, we conducted a comprehensive evaluation of our VmambaIR across multiple image restoration tasks, including image deraining, single image super-resolution, and real-world image super-resolution. Extensive experimental results demonstrate that our proposed VmambaIR achieves state-of-the-art (SOTA) performance with much fewer computational resources and parameters. Our research highlights the potential of state space models as promising alternatives to the transformer and CNN architectures in serving as foundational frameworks for next-generation low-level visual tasks.
图像修复是在低级计算机视觉中一个关键的任务,旨在从低质量的输入中恢复高质量的图像。为了解决这个问题,已经使用了各种模型,如卷积神经网络(CNNs)、生成对抗网络(GANs)、Transformer和扩散模型(DMs),这些模型对解决这个问题都产生了重大影响。然而,CNN在捕捉长距离依赖方面有限。DM需要大型先验模型和计算密集的去噪步骤。Transformer具有强大的建模能力,但由于输入图像大小导致二次复杂度,面临挑战。为了应对这些挑战,我们提出了VmambaIR,它将线性复杂度的状态空间模型(SSMs)引入全面图像修复任务中。我们使用一种Unet架构堆叠我们的建议的全面选择扫描(OSS)模块,包括一个OSS模块和一个高效的前馈网络(EFFN)。我们提出的全面选择扫描机制通过有效地建模所有六个方向上的图像信息流来克服SSMs的单向建模限制。此外,我们对VmambaIR在多个图像修复任务上的表现进行了全面评估,包括去雨、单图像超分辨率和大规模图像超分辨率。大量实验结果表明,与最先进的(SOTA)性能相比,我们的VmambaIR在更少的计算资源和参数的情况下实现了卓越的性能。我们的研究突出了状态空间模型的潜力,作为将变换器和CNN架构作为下一代低级视觉任务的基座框架的潜在替代方案。
https://arxiv.org/abs/2403.11423
Remote sensing image super-resolution (SR) is a crucial task to restore high-resolution (HR) images from low-resolution (LR) observations. Recently, the Denoising Diffusion Probabilistic Model (DDPM) has shown promising performance in image reconstructions by overcoming problems inherent in generative models, such as over-smoothing and mode collapse. However, the high-frequency details generated by DDPM often suffer from misalignment with HR images due to the model's tendency to overlook long-range semantic contexts. This is attributed to the widely used U-Net decoder in the conditional noise predictor, which tends to overemphasize local information, leading to the generation of noises with significant variances during the prediction process. To address these issues, an adaptive semantic-enhanced DDPM (ASDDPM) is proposed to enhance the detail-preserving capability of the DDPM by incorporating low-frequency semantic information provided by the Transformer. Specifically, a novel adaptive diffusion Transformer decoder (ADTD) is developed to bridge the semantic gap between the encoder and decoder through regulating the noise prediction with the global contextual relationships and long-range dependencies in the diffusion process. Additionally, a residual feature fusion strategy establishes information exchange between the two decoders at multiple levels. As a result, the predicted noise generated by our approach closely approximates that of the real noise distribution.Extensive experiments on two SR and two semantic segmentation datasets confirm the superior performance of the proposed ASDDPM in both SR and the subsequent downstream applications. The source code will be available at this https URL.
遥感图像超分辨率(SR)是将低分辨率(LR)观察结果恢复为高分辨率(HR)图像的关键任务。最近,由Denoising Diffusion Probabilistic Model(DDPM)产生的图像重构已经通过克服生成模型的固有问题的表现表明了具有前景。然而,DDPM产生的高频细节往往由于模型倾向于忽视长距离语义上下文而与HR图像错位。这归因于在条件噪声预测中广泛使用的U-Net解码器,它倾向于强调局部信息,导致预测过程中生成具有显著方差的大噪声。为了应对这些问题,我们提出了一个自适应语义增强的DDPM(ASDDPM),通过整合Transformer提供的低频语义信息来增强DDPM的细节保留能力。具体来说,我们开发了一种新的自适应扩散Transformer解码器(ADTD)来通过全局上下文关系和扩散过程的噪声预测来调节信息在编码器和解码器之间的交换。此外,残差特征融合策略建立了在多个级别上两个解码器之间的信息交流。通过这种方式,我们方法产生的预测噪声与真实噪声分布非常接近。在两个SR和两个语义分割数据集上的实验证实了所提出的ASDDPM在SR和后续应用中的优越性能。源代码将在此处链接。
https://arxiv.org/abs/2403.11078
Personalization is an important topic in text-to-image generation, especially the challenging multi-concept personalization. Current multi-concept methods are struggling with identity preservation, occlusion, and the harmony between foreground and background. In this work, we propose OMG, an occlusion-friendly personalized generation framework designed to seamlessly integrate multiple concepts within a single image. We propose a novel two-stage sampling solution. The first stage takes charge of layout generation and visual comprehension information collection for handling occlusions. The second one utilizes the acquired visual comprehension information and the designed noise blending to integrate multiple concepts while considering occlusions. We also observe that the initiation denoising timestep for noise blending is the key to identity preservation and layout. Moreover, our method can be combined with various single-concept models, such as LoRA and InstantID without additional tuning. Especially, LoRA models on this http URL can be exploited directly. Extensive experiments demonstrate that OMG exhibits superior performance in multi-concept personalization.
个性化是在文本到图像生成中的一个重要主题,尤其是在具有挑战性的多概念个性化方面。当前的多概念方法正努力解决身份保留、遮挡和前景与背景之间的和谐问题。在这项工作中,我们提出了OMG,一个旨在无缝集成多个概念的图像个性化生成框架。我们提出了一个新颖的两阶段采样解决方案。第一阶段负责布局生成和视觉理解信息收集,以处理遮挡。第二阶段利用获得的视觉理解信息和设计的噪声混合来整合多个概念,同时考虑遮挡。我们还观察到,噪声混合的启动去噪时间步是保持身份保留和布局的关键。此外,我们的方法可以与各种单概念模型(如LoRA和InstantID)结合,无需进行额外的调整。特别地,LoRA模型可以直接利用。大量实验证明,OMG在多概念个性化方面表现出优异性能。
https://arxiv.org/abs/2403.10983
Spatially resolved transcriptomics represents a significant advancement in single-cell analysis by offering both gene expression data and their corresponding physical locations. However, this high degree of spatial resolution entails a drawback, as the resulting spatial transcriptomic data at the cellular level is notably plagued by a high incidence of missing values. Furthermore, most existing imputation methods either overlook the spatial information between spots or compromise the overall gene expression data distribution. To address these challenges, our primary focus is on effectively utilizing the spatial location information within spatial transcriptomic data to impute missing values, while preserving the overall data distribution. We introduce \textbf{stMCDI}, a novel conditional diffusion model for spatial transcriptomics data imputation, which employs a denoising network trained using randomly masked data portions as guidance, with the unmasked data serving as conditions. Additionally, it utilizes a GNN encoder to integrate the spatial position information, thereby enhancing model performance. The results obtained from spatial transcriptomics datasets elucidate the performance of our methods relative to existing approaches.
空间解析转录组学代表了一个显著的进步,通过提供基因表达数据及其相应的物理位置,为单细胞分析提供了支持。然而,这种高空间分辨率也带来了局限性,因为细胞层面的空间转录组数据显著受到高缺失值的影响。此外,现有的 imputation 方法要么忽视了斑点之间的空间信息,要么破坏了整体基因表达数据分布。为了应对这些挑战,我们的主要关注点是有效地利用空间转录组数据中的空间位置信息来 impute 缺失值,同时保留整体数据分布。我们引入了 stMCDI,一种新颖的条件下扩散模型用于空间转录组数据 imputation,它采用通过随机遮罩数据部分进行训练的去噪网络作为指导,遮罩数据作为条件。此外,它还利用 GNN 编码器整合空间位置信息,从而提高模型性能。空间转录组数据获得的成果阐明了我们方法与现有方法的性能。
https://arxiv.org/abs/2403.10863
Volumetric optical microscopy using non-diffracting beams enables rapid imaging of 3D volumes by projecting them axially to 2D images but lacks crucial depth information. Addressing this, we introduce MicroDiffusion, a pioneering tool facilitating high-quality, depth-resolved 3D volume reconstruction from limited 2D projections. While existing Implicit Neural Representation (INR) models often yield incomplete outputs and Denoising Diffusion Probabilistic Models (DDPM) excel at capturing details, our method integrates INR's structural coherence with DDPM's fine-detail enhancement capabilities. We pretrain an INR model to transform 2D axially-projected images into a preliminary 3D volume. This pretrained INR acts as a global prior guiding DDPM's generative process through a linear interpolation between INR outputs and noise inputs. This strategy enriches the diffusion process with structured 3D information, enhancing detail and reducing noise in localized 2D images. By conditioning the diffusion model on the closest 2D projection, MicroDiffusion substantially enhances fidelity in resulting 3D reconstructions, surpassing INR and standard DDPM outputs with unparalleled image quality and structural fidelity. Our code and dataset are available at this https URL.
采用非扩散光束的体积光学显微镜能够通过将3D体积投影到2D图像来快速成像,但它缺乏关键的深度信息。为解决这个问题,我们引入了MicroDiffusion,这是一个促进高质量、深度解析3D体积重建的有限2D投影的先驱工具。虽然现有的隐式神经表示(INR)模型通常产生不完整的输出,而捕获细节的降噪扩散概率模型(DDPM)表现出色,但我们的方法将INR的结构因果性与DDPM的细节增强能力相结合。我们通过预训练INR模型将2D轴向投影图像转换为初步3D体积。预训练的INR模型充当全局先验,通过线性插值在INR输出和噪声输入之间引导DDPM的生成过程。这种策略丰富了扩散过程,提高了细节,并减少了局部2D图像中的噪声。通过将扩散模型对齐最接近的2D投影,MicroDiffusion在生成3D重构方面显著增强,超越了INR和标准DDPM输出,具有无与伦比的图像质量和结构准确性。我们的代码和数据集可在此处访问:https://url.com/
https://arxiv.org/abs/2403.10815
In this paper, we introduce StableGarment, a unified framework to tackle garment-centric(GC) generation tasks, including GC text-to-image, controllable GC text-to-image, stylized GC text-to-image, and robust virtual try-on. The main challenge lies in retaining the intricate textures of the garment while maintaining the flexibility of pre-trained Stable Diffusion. Our solution involves the development of a garment encoder, a trainable copy of the denoising UNet equipped with additive self-attention (ASA) layers. These ASA layers are specifically devised to transfer detailed garment textures, also facilitating the integration of stylized base models for the creation of stylized images. Furthermore, the incorporation of a dedicated try-on ControlNet enables StableGarment to execute virtual try-on tasks with precision. We also build a novel data engine that produces high-quality synthesized data to preserve the model's ability to follow prompts. Extensive experiments demonstrate that our approach delivers state-of-the-art (SOTA) results among existing virtual try-on methods and exhibits high flexibility with broad potential applications in various garment-centric image generation.
在本文中,我们提出了StableGarment,一个统一框架来解决 garment-centric(GC)生成任务,包括 GC 文本-图像、可控制 GC 文本-图像、风格化 GC 文本-图像和稳健虚拟试穿。主要挑战在于保留衣物细腻的纹理同时保持预训练 Stable Diffusion 的灵活性。我们的解决方案涉及开发一个衣物编码器,配备具有可训练的 self-attention(ASA)层的 denoising UNet 复制品。这些 ASA 层专门设计用于传输详细的衣物纹理,并促进基线模型对于生成风格化图像的集成。此外,引入了专用的试穿控制网络,使得 StableGarment 可以精确地执行虚拟试穿任务。我们还构建了一个新数据引擎,用于生成高质量的数据,以保留模型能够遵循提示的能力。大量实验证明,我们的方法在现有虚拟试穿方法中实现了最先进的性能,并具有广泛的潜在应用,涵盖了各种 GC 图像生成领域。
https://arxiv.org/abs/2403.10783
Diffusion-based generative models have emerged as powerful tools in the realm of generative modeling. Despite extensive research on denoising across various timesteps and noise levels, a conflict persists regarding the relative difficulties of the denoising tasks. While various studies argue that lower timesteps present more challenging tasks, others contend that higher timesteps are more difficult. To address this conflict, our study undertakes a comprehensive examination of task difficulties, focusing on convergence behavior and changes in relative entropy between consecutive probability distributions across timesteps. Our observational study reveals that denoising at earlier timesteps poses challenges characterized by slower convergence and higher relative entropy, indicating increased task difficulty at these lower timesteps. Building on these observations, we introduce an easy-to-hard learning scheme, drawing from curriculum learning, to enhance the training process of diffusion models. By organizing timesteps or noise levels into clusters and training models with descending orders of difficulty, we facilitate an order-aware training regime, progressing from easier to harder denoising tasks, thereby deviating from the conventional approach of training diffusion models simultaneously across all timesteps. Our approach leads to improved performance and faster convergence by leveraging the benefits of curriculum learning, while maintaining orthogonality with existing improvements in diffusion training techniques. We validate these advantages through comprehensive experiments in image generation tasks, including unconditional, class-conditional, and text-to-image generation.
基于扩散的生成模型在生成建模领域成为了强大的工具。尽管在各种时间步和非噪声水平上进行了广泛的去噪研究,但关于去噪任务的相对难易程度仍存在分歧。一些研究表明,较低的时间步具有更具有挑战性的任务,而其他研究表明,较高的时间步具有更具有挑战性的任务。为了解决这一分歧,我们的研究对任务难度进行全面审查,重点关注连续时间步中相邻概率分布的变化和收敛行为。我们的观察研究显示,在较早的时间步进行去噪会面临具有更慢收敛率和较高相对熵的挑战,表明这些较低时间步的task难度增加。在这些观察结果的基础上,我们引入了一种简单到难的学习方案,基于课程学习,以增强扩散模型的训练过程。通过将时间步或噪声水平组织成簇并按难度递减训练模型,我们促进了具有感知序号的训练状态,从较简单的去噪任务到较困难的任务,从而与同时对所有时间步训练扩散模型的传统方法相分离。通过利用课程学习的优势,我们获得了更好的性能和更快的收敛速度,同时保持与现有扩散训练技术改进的互异性。我们通过在图像生成任务中进行全面的实验来验证这些优势,包括条件概率、类条件概率和文本到图像生成。
https://arxiv.org/abs/2403.10348
Super-resolution (SR) and image generation are important tasks in computer vision and are widely adopted in real-world applications. Most existing methods, however, generate images only at fixed-scale magnification and suffer from over-smoothing and artifacts. Additionally, they do not offer enough diversity of output images nor image consistency at different scales. Most relevant work applied Implicit Neural Representation (INR) to the denoising diffusion model to obtain continuous-resolution yet diverse and high-quality SR results. Since this model operates in the image space, the larger the resolution of image is produced, the more memory and inference time is required, and it also does not maintain scale-specific consistency. We propose a novel pipeline that can super-resolve an input image or generate from a random noise a novel image at arbitrary scales. The method consists of a pretrained auto-encoder, a latent diffusion model, and an implicit neural decoder, and their learning strategies. The proposed method adopts diffusion processes in a latent space, thus efficient, yet aligned with output image space decoded by MLPs at arbitrary scales. More specifically, our arbitrary-scale decoder is designed by the symmetric decoder w/o up-scaling from the pretrained auto-encoder, and Local Implicit Image Function (LIIF) in series. The latent diffusion process is learnt by the denoising and the alignment losses jointly. Errors in output images are backpropagated via the fixed decoder, improving the quality of output images. In the extensive experiments using multiple public benchmarks on the two tasks i.e. image super-resolution and novel image generation at arbitrary scales, the proposed method outperforms relevant methods in metrics of image quality, diversity and scale consistency. It is significantly better than the relevant prior-art in the inference speed and memory usage.
超分辨率(SR)和图像生成是计算机视觉中的重要任务,并在实际应用中得到了广泛应用。然而,大多数现有方法在固定缩放级别生成图像,并遭受过拟合和伪影的困扰。此外,它们在不同尺度上的图像输出缺乏多样性,也没有保持尺度相关的 consistency。最有价值的工作是将隐式神经表示(INR)应用于去噪扩散模型,以获得连续分辨率的高质量SR结果。由于该模型在图像空间运行,生成的图像分辨率越大,需要更多的内存和推理时间,并且也不保持尺度相关的一致性。我们提出了一个新颖的管道,可以超分辨率输入图像或任意尺度生成新图像。该方法包括预训练的自编码器、一个潜在扩散模型和一个隐式神经解码器,以及它们的学习策略。所提出的方法在隐式空间中学习扩散过程,因此既高效又与MLP解码得到的输出图像空间相一致。具体来说,我们的任意尺度解码器是由预训练自编码器的对称解码器和一个局部隐式图像函数(LIIF)组成的。去噪和alignment损失是通过联合学习获得的。输出图像中的错误通过固定的解码器反向传播,从而提高输出图像的质量。在两个任务——图像超分辨率和任意尺度图像生成——的多项公共基准上进行广泛的实验,与相关方法相比,所提出的方法在图像质量、多样性和尺度一致性方面显著优越。它在推理速度和内存使用方面与相关先贤相比明显更好。
https://arxiv.org/abs/2403.10255
The pretraining-finetuning paradigm has gained widespread adoption in vision tasks and other fields, yet it faces the significant challenge of high sample annotation costs. To mitigate this, the concept of active finetuning has emerged, aiming to select the most appropriate samples for model finetuning within a limited budget. Traditional active learning methods often struggle in this setting due to their inherent bias in batch selection. Furthermore, the recent active finetuning approach has primarily concentrated on aligning the distribution of selected subsets with the overall data pool, focusing solely on diversity. In this paper, we propose a Bi-Level Active Finetuning framework to select the samples for annotation in one shot, which includes two stages: core sample selection for diversity, and boundary sample selection for uncertainty. The process begins with the identification of pseudo-class centers, followed by an innovative denoising method and an iterative strategy for boundary sample selection in the high-dimensional feature space, all without relying on ground-truth labels. Our comprehensive experiments provide both qualitative and quantitative evidence of our method's efficacy, outperforming all the existing baselines.
预训练-微调范式已经在各种任务中得到了广泛应用,然而它面临着高样本注释成本的显著挑战。为了减轻这一挑战,出现了主动微调的概念,旨在在有限预算内选择模型微调的最佳样本。传统的主动学习方法在這種设置中往往因为它们固有的批量选择偏见而遇到困难。此外,最近主动微调方法主要集中在将选定的子集的分布与整个数据池的分布对齐,仅关注多样性。在本文中,我们提出了一个双级别主动微调框架,用于在一次射击中选择注释样本,包括两个阶段:核心样本选择(多样性)和边界样本选择(不确定性)。过程始于伪类中心的出现,然后是采用创新去噪方法和在高维特征空间中边界的迭代策略。所有这些方法都没有依赖于真实标签。我们全面的实验提供了我们方法的成效的定量和定性证据,超越了所有现有基线。
https://arxiv.org/abs/2403.10069
Hyperspectral image (HSI) denoising is critical for the effective analysis and interpretation of hyperspectral data. However, simultaneously modeling global and local features is rarely explored to enhance HSI denoising. In this letter, we propose a hybrid convolution and attention network (HCANet), which leverages both the strengths of convolution neural networks (CNNs) and Transformers. To enhance the modeling of both global and local features, we have devised a convolution and attention fusion module aimed at capturing long-range dependencies and neighborhood spectral correlations. Furthermore, to improve multi-scale information aggregation, we design a multi-scale feed-forward network to enhance denoising performance by extracting features at different scales. Experimental results on mainstream HSI datasets demonstrate the rationality and effectiveness of the proposed HCANet. The proposed model is effective in removing various types of complex noise. Our codes are available at \url{this https URL}.
超光谱图像(HSI)去噪对于有效分析和解释超光谱数据至关重要。然而,同时建模全局和局部特征通常是一个很少被研究的问题,以增强HSI去噪效果。在本文中,我们提出了一种混合卷积和注意网络(HCANet),它利用了卷积神经网络(CNN)和Transformer的优点。为了增强对全局和局部特征的建模,我们设计了一个旨在捕捉长距离依赖关系和邻接光谱关联的卷积和注意合并模块。此外,为了提高多尺度信息聚合,我们设计了一个多尺度前馈网络,通过提取不同尺度的特征来提高去噪性能。在主流HSI数据集上进行实验结果表明,与预设相比,所提出的HCANet具有合理性和有效性。所提出的模型对于消除各种类型的复杂噪声非常有效。我们的代码可在此处访问:\url{https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https://
https://arxiv.org/abs/2403.10067
Controllable spherical panoramic image generation holds substantial applicative potential across a variety of domains.However, it remains a challenging task due to the inherent spherical distortion and geometry characteristics, resulting in low-quality content this http URL this paper, we introduce a novel framework of SphereDiffusion to address these unique challenges, for better generating high-quality and precisely controllable spherical panoramic images.For the spherical distortion characteristic, we embed the semantics of the distorted object with text encoding, then explicitly construct the relationship with text-object correspondence to better use the pre-trained knowledge of the planar images.Meanwhile, we employ a deformable technique to mitigate the semantic deviation in latent space caused by spherical distortion.For the spherical geometry characteristic, in virtue of spherical rotation invariance, we improve the data diversity and optimization objectives in the training process, enabling the model to better learn the spherical geometry characteristic.Furthermore, we enhance the denoising process of the diffusion model, enabling it to effectively use the learned geometric characteristic to ensure the boundary continuity of the generated images.With these specific techniques, experiments on Structured3D dataset show that SphereDiffusion significantly improves the quality of controllable spherical image generation and relatively reduces around 35% FID on average.
可控制球形全景图像生成在各种领域具有很大的应用潜力。然而,由于固有的球形扭曲和几何特征,导致生成的内容质量较低,因此这是一个具有挑战性的任务。在本文中,我们提出了一个新的框架SphereDiffusion来解决这些独特的问题,以更好地生成高质量和高精度可控制球形全景图像。 对于球形扭曲特征,我们通过文本编码来嵌入失真对象的语义,然后明确地构建文本-对象对应关系,更好地利用预训练知识。同时,我们采用了一种可变形的技术来减轻球形扭曲引起的语义偏差。 对于球形几何特征,由于球形旋转不变性,我们通过改进训练过程中的数据多样性和优化目标,使模型更好地学习球形几何特征。此外,我们还增强了扩散模型的去噪过程,使它能够有效利用学习到的几何特征来确保生成图像的边界连续性。 通过使用这些具体技术,Structured3D数据集上的实验表明,SphereDiffusion显著提高了可控制球形图像生成的质量,平均降低了约35%的FID。
https://arxiv.org/abs/2403.10044
We present a novel image editing scenario termed Text-grounded Object Generation (TOG), defined as generating a new object in the real image spatially conditioned by textual descriptions. Existing diffusion models exhibit limitations of spatial perception in complex real-world scenes, relying on additional modalities to enforce constraints, and TOG imposes heightened challenges on scene comprehension under the weak supervision of linguistic information. We propose a universal framework ST-LDM based on Swin-Transformer, which can be integrated into any latent diffusion model with training-free backward guidance. ST-LDM encompasses a global-perceptual autoencoder with adaptable compression scales and hierarchical visual features, parallel with deformable multimodal transformer to generate region-wise guidance for the subsequent denoising process. We transcend the limitation of traditional attention mechanisms that only focus on existing visual features by introducing deformable feature alignment to hierarchically refine spatial positioning fused with multi-scale visual and linguistic information. Extensive Experiments demonstrate that our model enhances the localization of attention mechanisms while preserving the generative capabilities inherent to diffusion models.
我们提出了一个名为Text-grounded Object Generation(TOG)的新图像编辑场景,它定义为根据文本描述在真实图像中生成新的对象。现有的扩散模型在复杂的现实场景中表现出空间感知限制,需要额外的模块来强制约束,而TOG在弱监督的语言信息下对场景理解提出了更高的挑战。基于Swin-Transformer的通用框架ST-LDM是我们提出的一种方法,可以集成到任何具有训练-free反向指导的潜在扩散模型中。ST-LDM包括一个全局感知自动编码器,具有可调整的压缩级别和分层视觉特征,与可变形多模态Transformer一起生成区域级的指导,用于后续去噪过程。我们通过引入可变形特征对齐来超越传统关注机制的局限性,这种对齐基于多尺度视觉和语言信息对空间位置进行平滑。大量实验证明,我们的模型在保持扩散模型的生成能力的同时提高了注意机制的定位精度。
https://arxiv.org/abs/2403.10004
Understanding human actions from body poses is critical for assistive robots sharing space with humans in order to make informed and safe decisions about the next interaction. However, precise temporal localization and annotation of activity sequences is time-consuming and the resulting labels are often noisy. If not effectively addressed, label noise negatively affects the model's training, resulting in lower recognition quality. Despite its importance, addressing label noise for skeleton-based action recognition has been overlooked so far. In this study, we bridge this gap by implementing a framework that augments well-established skeleton-based human action recognition methods with label-denoising strategies from various research areas to serve as the initial benchmark. Observations reveal that these baselines yield only marginal performance when dealing with sparse skeleton data. Consequently, we introduce a novel methodology, NoiseEraSAR, which integrates global sample selection, co-teaching, and Cross-Modal Mixture-of-Experts (CM-MOE) strategies, aimed at mitigating the adverse impacts of label noise. Our proposed approach demonstrates better performance on the established benchmark, setting new state-of-the-art standards. The source code for this study will be made accessible at this https URL.
从人体姿态中理解人类动作对于与人类共享空间的辅助机器人来说至关重要,以便在下一个交互中做出明智和安全的决定。然而,精确的时间局部化和活动序列的标注是一个耗时且耗资的过程,所得的标签常常是嘈杂的。如果没有得到有效解决,标签噪声将影响模型的训练,导致识别质量下降。尽管解决这个问题很重要,但迄今为止还没有有效地解决基于骨骼的动作识别的标签噪声问题。在本研究中,我们通过在基于骨骼的人体动作识别方法中集成各种研究领域的标签去噪策略,为这一领域提供一个初步的基准。观察结果表明,这些基线在处理稀疏骨骼数据时的表现只是微不足道。因此,我们引入了一种新的方法,NoiseEraSAR,它结合了全局样本选择、协同教学和跨模态专家混合(CM-MOE)策略,旨在减轻标签噪声的负面影响。我们所提出的方法在既定基准上的表现优于其他基线,为现有的技术水平树立了新的标杆。本研究的源代码将在这个链接中公开:https://www. this URL。
https://arxiv.org/abs/2403.09975
We present Score-Guided Human Mesh Recovery (ScoreHMR), an approach for solving inverse problems for 3D human pose and shape reconstruction. These inverse problems involve fitting a human body model to image observations, traditionally solved through optimization techniques. ScoreHMR mimics model fitting approaches, but alignment with the image observation is achieved through score guidance in the latent space of a diffusion model. The diffusion model is trained to capture the conditional distribution of the human model parameters given an input image. By guiding its denoising process with a task-specific score, ScoreHMR effectively solves inverse problems for various applications without the need for retraining the task-agnostic diffusion model. We evaluate our approach on three settings/applications. These are: (i) single-frame model fitting; (ii) reconstruction from multiple uncalibrated views; (iii) reconstructing humans in video sequences. ScoreHMR consistently outperforms all optimization baselines on popular benchmarks across all settings. We make our code and models available at the this https URL.
我们提出了Score-Guided Human Mesh Recovery (ScoreHMR)方法,用于解决3D人体姿态和形状重建的逆问题。这些问题通过传统的优化技术来解决,但ScoreHMR通过扩散模型的潜在空间中的分数指导来实现与图像观察的同步。扩散模型被训练来捕获给定输入图像的人体模型参数的联合分布。通过将任务特定的分数作为其去噪过程的指导,ScoreHMR有效地解决了各种应用中的逆问题,而无需重新训练无关于任务的扩散模型。我们在三个设置/应用中评估了我们的方法。这些设置/应用是:(i)单帧模型拟合;(ii)从多个未校准的视角进行重建;(iii)在视频序列中重建人类。ScoreHMR在所有设置中都一致地超过了所有优化基线。我们将我们的代码和模型公开在https:// this URL上。
https://arxiv.org/abs/2403.09623