Abstract
Zero-shot sketch-based image retrieval (SBIR) is an emerging task in computer vision, allowing to retrieve natural images relevant to sketch queries that might not been seen in the training phase. Existing works either require aligned sketch-image pairs or inefficient memory fusion layer for mapping the visual information to a semantic space. In this work, we propose a semantically aligned paired cycle-consistent generative (SEM-PCYC) model for zero-shot SBIR, where each branch maps the visual information to a common semantic space via an adversarial training. Each of these branches maintains a cycle consistency that only requires supervision at category levels, and avoids the need of highly-priced aligned sketch-image pairs. A classification criteria on the generators' outputs ensures the visual to semantic space mapping to be discriminating. Furthermore, we propose to combine textual and hierarchical side information via a feature selection auto-encoder that selects discriminating side information within a same end-to-end model. Our results demonstrate a significant boost in zero-shot SBIR performance over the state-of-the-art on the challenging Sketchy and TU-Berlin datasets.
Abstract (translated)
基于零镜头草图的图像检索(SBIR)是计算机视觉中的一项新兴任务,它允许检索与草图查询相关的自然图像,而这些草图查询可能在培训阶段未被发现。现有的作品要么需要对齐的草图图像对,要么需要低效的内存融合层来将视觉信息映射到语义空间。在这项工作中,我们提出了一个零镜头SBIR的语义对齐的成对循环一致生成(sem-pcyc)模型,其中每个分支通过对抗训练将视觉信息映射到一个公共语义空间。这些分支中的每一个都保持了一个周期一致性,只需要在类别级别上进行监督,并且避免了高定价的对齐草图图像对的需要。生成器输出的分类标准确保视觉到语义空间的映射是有区别的。此外,我们还建议通过一个特征选择自动编码器将文本和层次化的侧信息结合起来,该编码器在相同的端到端模型中选择识别的侧信息。我们的结果表明,在具有挑战性的草图和Tu-Berlin数据集上,零镜头SBIR的性能比最新技术有了显著提高。
URL
https://arxiv.org/abs/1903.03372