Paper Reading AI Learner

Stable Attribute Group Editing for Reliable Few-shot Image Generation

2023-02-01 01:51:47
Guanqi Ding, Xinzhe Han, Shuhui Wang, Xin Jin, Dandan Tu, Qingming Huang

Abstract

Few-shot image generation aims to generate data of an unseen category based on only a few samples. Apart from basic content generation, a bunch of downstream applications hopefully benefit from this task, such as low-data detection and few-shot classification. To achieve this goal, the generated images should guarantee category retention for classification beyond the visual quality and diversity. In our preliminary work, we present an ``editing-based'' framework Attribute Group Editing (AGE) for reliable few-shot image generation, which largely improves the generation performance. Nevertheless, AGE's performance on downstream classification is not as satisfactory as expected. This paper investigates the class inconsistency problem and proposes Stable Attribute Group Editing (SAGE) for more stable class-relevant image generation. SAGE takes use of all given few-shot images and estimates a class center embedding based on the category-relevant attribute dictionary. Meanwhile, according to the projection weights on the category-relevant attribute dictionary, we can select category-irrelevant attributes from the similar seen categories. Consequently, SAGE injects the whole distribution of the novel class into StyleGAN's latent space, thus largely remains the category retention and stability of the generated images. Going one step further, we find that class inconsistency is a common problem in GAN-generated images for downstream classification. Even though the generated images look photo-realistic and requires no category-relevant editing, they are usually of limited help for downstream classification. We systematically discuss this issue from both the generative model and classification model perspectives, and propose to boost the downstream classification performance of SAGE by enhancing the pixel and frequency components.

Abstract (translated)

少样本图像生成的目标是仅基于少数样本生成未知类别的数据。除了基本的内容生成,我们希望通过一系列下游应用受益,例如低数据检测和少样本分类。为了实现这一目标,生成的图像应该保证类别保留,除了视觉质量和多样性,进行进一步的类别分类。在我们的初步工作中,我们提出了一个基于“编辑”的框架——属性组编辑(AGE),用于可靠少样本图像生成,这极大地提高了生成性能。然而,AGE在下游分类方面的表现并没有达到预期。本文研究了类不一致问题,并提出了稳定的属性组编辑(SAGE),以更稳定的类相关图像生成。SAGE使用所有给定的少样本图像,并基于类相关的属性字典估计一个类中心嵌入。同时,根据类相关的属性字典的投影权重,可以从类似可见的类中选择无关类属性。因此,SAGE将整个新类分布注入到风格GAN的潜在空间中,从而在很大程度上保留了生成的图像的类别保留和稳定性。进一步而言,我们发现,类不一致是GAN生成图像的下游分类中常见的问题。尽管生成的图像看起来逼真且不需要类相关的编辑,但它们通常对下游分类的帮助有限。我们系统地讨论了这个问题,从生成模型和分类模型的角度来看,并提出了通过增强像素和频率成分来提高SAGE的下游分类性能。

URL

https://arxiv.org/abs/2302.00179

PDF

https://arxiv.org/pdf/2302.00179.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot