Paper Reading AI Learner

Learning Feature-to-Feature Translator by Alternating Back-Propagation for Zero-Shot Learning

2019-04-22 20:30:21
Yizhe Zhu, Jianwen Xie, Bingchen Liu, Ahmed Elgammal

Abstract

We investigate learning feature-to-feature translator networks by alternating back-propagation as a general-purpose solution to zero-shot learning (ZSL) problems. Our method can be categorized to a generative model-based ZSL one. In contrast to the GAN or VAE that requires auxiliary networks to assist the training, our model consists of a single conditional generator that maps the class feature and a latent vector %accounting for randomness in the output to the image feature, and is trained by maximum likelihood estimation. The training process is a simple yet effective EM-like process that iterates the following two steps: (i) the inferential back-propagation to infer the latent noise vector of each observed data, and (ii) the learning back-propagation to update the parameters of the model. With slight modifications of our model, we also provide a solution to learning from incomplete visual features for ZSL. We conduct extensive comparisons with existing generative ZSL methods on five benchmarks, demonstrating the superiority of our method in not only performance but also convergence speed and computational cost. Specifically, our model outperforms the existing state-of-the-art methods by a remarkable margin up to $3.1\%$ and $4.0\%$ in ZSL and generalized ZSL settings, respectively.

Abstract (translated)

我们研究了特征转换网络的学习特性,将交替反向传播作为零镜头学习(zsl)问题的通用解决方案。我们的方法可以分为基于生成模型的zsl方法。与需要辅助网络辅助训练的GAN或VAE相比,我们的模型由一个单一的条件发生器组成,该条件发生器映射类特征和占图像特征输出随机性的潜在向量百分比,并通过最大似然估计进行训练。训练过程是一个简单而有效的类电磁过程,它重复以下两个步骤:(i)推断反向传播以推断每个观测数据的潜在噪声矢量;(ii)学习反向传播以更新模型参数。通过对模型的细微修改,我们还提供了一种从zsl的不完整视觉特性中学习的解决方案。我们在五个基准上对现有的生成zsl方法进行了广泛的比较,证明了该方法在性能、收敛速度和计算成本等方面的优越性。具体来说,我们的模型在zsl和generalized zsl设置中的利润率分别高达3.1%和4.0%。

URL

https://arxiv.org/abs/1904.10056

PDF

https://arxiv.org/pdf/1904.10056.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot