Stacked Semantic-Guided Network for Zero-Shot Sketch-Based Image Retrieval

Abstract
Abstract (translated)
URL
PDF

Abstract

Zero-shot sketch-based image retrieval (ZS-SBIR) is a task of cross-domain image retrieval from a natural image gallery with free-hand sketch under a zero-shot scenario. Previous works mostly focus on a generative approach that takes a highly abstract and sparse sketch as input and then synthesizes the corresponding natural image. However, the intrinsic visual sparsity and large intra-class variance of the sketch make the learning of the conditional decoder more difficult and hence achieve unsatisfactory retrieval performance. In this paper, we propose a novel stacked semantic-guided network to address the unique characteristics of sketches in ZS-SBIR. Specifically, we devise multi-layer feature fusion networks that incorporate different intermediate feature representation information in a deep neural network to alleviate the intrinsic sparsity of sketches. In order to improve visual knowledge transfer from seen to unseen classes, we elaborate a coarse-to-fine conditional decoder that generates coarse-grained category-specific corresponding features first (taking auxiliary semantic information as conditional input) and then generates fine-grained instance-specific corresponding features (taking sketch representation as conditional input). Furthermore, regression loss and classification loss are utilized to preserve the semantic and discriminative information of the synthesized features respectively. Extensive experiments on the large-scale Sketchy dataset and TU-Berlin dataset demonstrate that our proposed approach outperforms state-of-the-art methods by more than 20\% in retrieval performance.

Abstract (translated)

基于零镜头草图的图像检索（zs-sbir）是在零镜头场景下，利用自由手绘的自然图像库进行跨域图像检索的一项任务。以前的作品主要集中在以高度抽象和稀疏的草图为输入的生成方法上，然后合成相应的自然图像。然而，素描的内在视觉稀疏性和较大的类内方差使得条件译码器的学习更加困难，从而导致检索性能不理想。本文提出了一种新颖的叠层语义引导网络，解决了ZS-SBIR中草图的独特性。具体地说，我们设计了多层特征融合网络，将不同的中间特征表示信息融合到一个深度神经网络中，以减轻草图的固有稀疏性。为了提高视觉知识从可见类到未知类的传递，我们设计了一种粗到细的条件译码器，它首先生成粗粒度的特定类别的对应特征（以辅助语义信息为条件输入），然后生成细粒度的特定实例的对应特征（以sketc为例）h表示为条件输入）。此外，利用回归损失和分类损失分别保存了合成特征的语义信息和识别信息。对大型粗略数据集和Tu Berlin数据集进行的大量实验表明，我们提出的方法在检索性能上优于最先进的方法20%以上。

URL

https://arxiv.org/abs/1904.01971

PDF

https://arxiv.org/pdf/1904.01971.pdf