Paper Reading AI Learner

Boundary-aware Decoupled Flow Networks for Realistic Extreme Rescaling

2024-05-05 14:05:33
Jinmin Li, Tao Dai, Jingyun Zhang, Kang Liu, Jun Wang, Shaoming Wang, Shu-Tao Xia, rizen guo

Abstract

Recently developed generative methods, including invertible rescaling network (IRN) based and generative adversarial network (GAN) based methods, have demonstrated exceptional performance in image rescaling. However, IRN-based methods tend to produce over-smoothed results, while GAN-based methods easily generate fake details, which thus hinders their real applications. To address this issue, we propose Boundary-aware Decoupled Flow Networks (BDFlow) to generate realistic and visually pleasing results. Unlike previous methods that model high-frequency information as standard Gaussian distribution directly, our BDFlow first decouples the high-frequency information into \textit{semantic high-frequency} that adheres to a Boundary distribution and \textit{non-semantic high-frequency} counterpart that adheres to a Gaussian distribution. Specifically, to capture semantic high-frequency parts accurately, we use Boundary-aware Mask (BAM) to constrain the model to produce rich textures, while non-semantic high-frequency part is randomly sampled from a Gaussian distribution.Comprehensive experiments demonstrate that our BDFlow significantly outperforms other state-of-the-art methods while maintaining lower complexity. Notably, our BDFlow improves the PSNR by $4.4$ dB and the SSIM by $0.1$ on average over GRAIN, utilizing only 74\% of the parameters and 20\% of the computation. The code will be available at this https URL.

Abstract (translated)

近年来,包括基于反向平滑网络(IRN)和基于生成对抗网络(GAN)的方法在内的发展起来的图像平滑方法在图像平滑方面表现出了出色的性能。然而,IRN方法往往产生过于平滑的结果,而GAN方法容易生成虚假细节,从而阻碍了其真实应用。为了解决这个问题,我们提出了边界感知分离流网络(BDFlow)以生成真实且观感良好的结果。与之前的方法不同,我们的BDFlow首先将高频率信息分解为语义高频率和 Gaussian 分布的 non-语义高频率对应物。具体来说,为了准确捕捉语义高频率部分,我们使用边界感知掩码(BAM)约束模型产生丰富纹理,而 non-语义高频率部分从 Gaussian 分布中随机采样。 全面的实验证明,我们的BDFlow在保持较低复杂性的同时显著优于其他最先进的方法。值得注意的是,我们的BDFlow在GRAIN上提高了$4.4$ dB的 PSNR 值和$0.1$的 SSIM 值,只需使用 74% 的参数和 20% 的计算。代码将在此链接处提供:<https:// this URL>

URL

https://arxiv.org/abs/2405.02941

PDF

https://arxiv.org/pdf/2405.02941.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot