Abstract
We investigate learning feature-to-feature translator networks by alternating back-propagation as a general-purpose solution to zero-shot learning (ZSL) problems. Our method can be categorized to a generative model-based ZSL one. In contrast to the GAN or VAE that requires auxiliary networks to assist the training, our model consists of a single conditional generator that maps the class feature and a latent vector %accounting for randomness in the output to the image feature, and is trained by maximum likelihood estimation. The training process is a simple yet effective EM-like process that iterates the following two steps: (i) the inferential back-propagation to infer the latent noise vector of each observed data, and (ii) the learning back-propagation to update the parameters of the model. With slight modifications of our model, we also provide a solution to learning from incomplete visual features for ZSL. We conduct extensive comparisons with existing generative ZSL methods on five benchmarks, demonstrating the superiority of our method in not only performance but also convergence speed and computational cost. Specifically, our model outperforms the existing state-of-the-art methods by a remarkable margin up to $3.1\%$ and $4.0\%$ in ZSL and generalized ZSL settings, respectively.
Abstract (translated)
我们研究了特征转换网络的学习特性,将交替反向传播作为零镜头学习(zsl)问题的通用解决方案。我们的方法可以分为基于生成模型的zsl方法。与需要辅助网络辅助训练的GAN或VAE相比,我们的模型由一个单一的条件发生器组成,该条件发生器映射类特征和占图像特征输出随机性的潜在向量百分比,并通过最大似然估计进行训练。训练过程是一个简单而有效的类电磁过程,它重复以下两个步骤:(i)推断反向传播以推断每个观测数据的潜在噪声矢量;(ii)学习反向传播以更新模型参数。通过对模型的细微修改,我们还提供了一种从zsl的不完整视觉特性中学习的解决方案。我们在五个基准上对现有的生成zsl方法进行了广泛的比较,证明了该方法在性能、收敛速度和计算成本等方面的优越性。具体来说,我们的模型在zsl和generalized zsl设置中的利润率分别高达3.1%和4.0%。
URL
https://arxiv.org/abs/1904.10056