Paper Reading AI Learner

DiffusionEdge: Diffusion Probabilistic Model for Crisp Edge Detection

2024-01-04 02:20:54
Yunfan Ye, Kai Xu, Yuhang Huang, Renjiao Yi, Zhiping Cai

Abstract

Limited by the encoder-decoder architecture, learning-based edge detectors usually have difficulty predicting edge maps that satisfy both correctness and crispness. With the recent success of the diffusion probabilistic model (DPM), we found it is especially suitable for accurate and crisp edge detection since the denoising process is directly applied to the original image size. Therefore, we propose the first diffusion model for the task of general edge detection, which we call DiffusionEdge. To avoid expensive computational resources while retaining the final performance, we apply DPM in the latent space and enable the classic cross-entropy loss which is uncertainty-aware in pixel level to directly optimize the parameters in latent space in a distillation manner. We also adopt a decoupled architecture to speed up the denoising process and propose a corresponding adaptive Fourier filter to adjust the latent features of specific frequencies. With all the technical designs, DiffusionEdge can be stably trained with limited resources, predicting crisp and accurate edge maps with much fewer augmentation strategies. Extensive experiments on four edge detection benchmarks demonstrate the superiority of DiffusionEdge both in correctness and crispness. On the NYUDv2 dataset, compared to the second best, we increase the ODS, OIS (without post-processing) and AC by 30.2%, 28.1% and 65.1%, respectively. Code: this https URL.

Abstract (translated)

受到编码器-解码器架构的限制,基于学习的边缘检测器通常很难同时预测满足正确性和清晰度的边缘图。随着扩散概率模型的(DPM)最近的成功,我们发现它特别适合准确和清晰的边缘检测,因为去噪过程直接应用于原始图像大小。因此,我们提出了第一个扩散模型,称为DiffusionEdge,用于一般边缘检测。为了在保留最终性能的同时节省计算资源,我们在潜在空间中应用DPM,并启用经典交叉熵损失,该损失在像素级别具有不确定性意识,以以蒸馏方式直接优化潜在空间中的参数。我们还采用了解耦架构,以加速去噪过程,并相应地自适应调整特定频率的潜在特征。通过所有这些技术设计,DiffusionEdge可以在有限的资源上稳定训练,预测具有许多减少增强策略的清晰和准确边缘图。在四个边缘检测基准上的大量实验都证明了DiffusionEdge在正确性和清晰度方面的优越性。在NYUDv2数据集上,与第二好的相比,我们分别增加了ODS、OIS(没有后处理)和AC 30.2%、28.1%和65.1%。代码:此链接。

URL

https://arxiv.org/abs/2401.02032

PDF

https://arxiv.org/pdf/2401.02032.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot