Paper Reading AI Learner

FADPNet: Frequency-Aware Dual-Path Network for Face Super-Resolution

2025-06-17 02:33:42
Siyu Xu, Wenjie Li, Guangwei Gao, Jian Yang, Guo-Jun Qi, Chia-Wen Lin

Abstract

Face super-resolution (FSR) under limited computational costs remains an open problem. Existing approaches typically treat all facial pixels equally, resulting in suboptimal allocation of computational resources and degraded FSR performance. CNN is relatively sensitive to high-frequency facial features, such as component contours and facial outlines. Meanwhile, Mamba excels at capturing low-frequency features like facial color and fine-grained texture, and does so with lower complexity than Transformers. Motivated by these observations, we propose FADPNet, a Frequency-Aware Dual-Path Network that decomposes facial features into low- and high-frequency components and processes them via dedicated branches. For low-frequency regions, we introduce a Mamba-based Low-Frequency Enhancement Block (LFEB), which combines state-space attention with squeeze-and-excitation operations to extract low-frequency global interactions and emphasize informative channels. For high-frequency regions, we design a CNN-based Deep Position-Aware Attention (DPA) module to enhance spatially-dependent structural details, complemented by a lightweight High-Frequency Refinement (HFR) module that further refines frequency-specific representations. Through the above designs, our method achieves an excellent balance between FSR quality and model efficiency, outperforming existing approaches.

Abstract (translated)

在计算成本有限的情况下,人脸超分辨率(FSR)仍然是一个开放性问题。现有方法通常对所有面部像素一视同仁,导致计算资源分配不均衡和FSR性能下降。卷积神经网络(CNN)对于如面部轮廓和边界的高频特征较为敏感;同时,Mamba模型擅长捕捉肤色和细腻纹理等低频特征,并且其复杂度低于Transformer模型。基于这些观察结果,我们提出了频率感知双路径网络FADPNet。该网络将面部特征分解为低频和高频成分,并通过专用分支进行处理。 对于低频区域,我们引入了一个基于Mamba的低频增强块(LFEB),结合状态空间注意力与挤压激励操作以提取低频全局互动并强调信息通道。针对高频区域,设计了一种基于CNN的位置感知深度注意模块DPA来加强依赖于位置的结构细节,并辅以轻量级的高频细化(HFR)模块进一步优化频率特定表示。 通过上述设计,我们的方法在FSR质量和模型效率之间实现了优异的平衡,超越了现有方法。

URL

https://arxiv.org/abs/2506.14121

PDF

https://arxiv.org/pdf/2506.14121.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot