Paper Reading AI Learner

Structure-Aware Human Body Reshaping with Adaptive Affinity-Graph Network

2024-04-22 08:44:10
Qiwen Deng, Yangcen Liu, Wen Li, Guoqing Wang

Abstract

Given a source portrait, the automatic human body reshaping task aims at editing it to an aesthetic body shape. As the technology has been widely used in media, several methods have been proposed mainly focusing on generating optical flow to warp the body shape. However, those previous works only consider the local transformation of different body parts (arms, torso, and legs), ignoring the global affinity, and limiting the capacity to ensure consistency and quality across the entire body. In this paper, we propose a novel Adaptive Affinity-Graph Network (AAGN), which extracts the global affinity between different body parts to enhance the quality of the generated optical flow. Specifically, our AAGN primarily introduces the following designs: (1) we propose an Adaptive Affinity-Graph (AAG) Block that leverages the characteristic of a fully connected graph. AAG represents different body parts as nodes in an adaptive fully connected graph and captures all the affinities between nodes to obtain a global affinity map. The design could better improve the consistency between body parts. (2) Besides, for high-frequency details are crucial for photo aesthetics, a Body Shape Discriminator (BSD) is designed to extract information from both high-frequency and spatial domain. Particularly, an SRM filter is utilized to extract high-frequency details, which are combined with spatial features as input to the BSD. With this design, BSD guides the Flow Generator (FG) to pay attention to various fine details rather than rigid pixel-level fitting. Extensive experiments conducted on the BR-5K dataset demonstrate that our framework significantly enhances the aesthetic appeal of reshaped photos, marginally surpassing all previous work to achieve state-of-the-art in all evaluation metrics.

Abstract (translated)

给定一个源肖像,自动人体重塑任务的目的是将它们编辑成美学身材形状。随着这项技术在媒体中的广泛应用,已经提出了几种主要关注于生成光学流来扭曲身材形状的方法。然而,这些先前的作品仅考虑了不同身体部分(手臂、躯干和腿)的局部变换,忽略了全局关联,并限制了在整个身体中确保一致性和质量的能力。在本文中,我们提出了一种新颖的自适应亲和性图网络(AAGN),旨在提高生成光学流的质量。具体来说,我们的自适应亲和性图网络主要引入了以下设计:(1)我们提出了一个自适应亲和性图(AAG)模块,利用了完全连接图的特性。AAG 将不同的身体部分节点表示为适应性完全连接图中的节点,并捕获所有节点之间的关联以获得全局关联图。这个设计可以更好地改善身体部分之间的一致性。(2)此外,对于高频率细节对于照片美学至关重要,我们设计了一个身体形状判别器(BSD),用于从高频率和空间域提取信息。特别是,使用了SRM滤波器提取高频率细节,将空间特征作为输入与BSD结合。这种设计使得BSD引导流量生成器(FG)关注各种微小细节,而不是对像素级的拟合。在BR-5K数据集上进行的大量实验证明,我们的框架显著增强了重塑照片的美学吸引力,略微超过所有先前的作品,在所有评估指标上实现了最先进水平。

URL

https://arxiv.org/abs/2404.13983

PDF

https://arxiv.org/pdf/2404.13983.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot