Paper Reading AI Learner

Bidirectional Consistency Models

2024-03-26 18:40:36
Liangchen Li, Jiajun He

Abstract

Diffusion models (DMs) are capable of generating remarkably high-quality samples by iteratively denoising a random vector, a process that corresponds to moving along the probability flow ordinary differential equation (PF ODE). Interestingly, DMs can also invert an input image to noise by moving backward along the PF ODE, a key operation for downstream tasks such as interpolation and image editing. However, the iterative nature of this process restricts its speed, hindering its broader application. Recently, Consistency Models (CMs) have emerged to address this challenge by approximating the integral of the PF ODE, thereby bypassing the need to iterate. Yet, the absence of an explicit ODE solver complicates the inversion process. To resolve this, we introduce the Bidirectional Consistency Model (BCM), which learns a single neural network that enables both forward and backward traversal along the PF ODE, efficiently unifying generation and inversion tasks within one framework. Notably, our proposed method enables one-step generation and inversion while also allowing the use of additional steps to enhance generation quality or reduce reconstruction error. Furthermore, by leveraging our model's bidirectional consistency, we introduce a sampling strategy that can enhance FID while preserving the generated image content. We further showcase our model's capabilities in several downstream tasks, such as interpolation and inpainting, and present demonstrations of potential applications, including blind restoration of compressed images and defending black-box adversarial attacks.

Abstract (translated)

扩散模型(DMs)通过迭代地消噪随机向量来生成高质量的样本,这个过程相当于沿着概率流普通微分方程(PF ODE)移动。有趣的是,DMs还可以通过沿着PF ODE向前移动来反转输入图像,这是下游任务(如插值和图像编辑)的关键操作。然而,这个过程的迭代性质限制了其速度,阻碍了更广泛的应用。最近,一致性模型(CMs)应运而生,通过近似PF ODE的积分来解决这一挑战,从而绕过了迭代需求。然而,缺乏显式的ODE求解器使反向过程变得复杂。为了解决这个问题,我们引入了双向一致性模型(BCM),该模型学习了一个单个神经网络,可以在PF ODE上进行前向和反向遍历,将生成和反向遍历任务在同一个框架内高效地统一起来。值得注意的是,我们所提出的方法可以在一步生成和反向遍历的同时,允许使用额外的步骤来提高生成质量或减少重构误差。此外,通过利用我们模型的双向一致性,我们引入了一种采样策略,可以在保留生成图像内容的同时增强FID。我们还展示了我们模型的能力在多个下游任务中,如插值和修复,并展示了潜在应用的演示,包括恢复压缩图像的盲修复和防御黑盒攻击。

URL

https://arxiv.org/abs/2403.18035

PDF

https://arxiv.org/pdf/2403.18035.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot