Paper Reading AI Learner

Stacked Semantic-Guided Network for Zero-Shot Sketch-Based Image Retrieval

2019-04-03 12:33:47
Hao Wang, Cheng Deng, Xinxu Xu, Wei Liu, Xinbo Gao, Dacheng Tao

Abstract

Zero-shot sketch-based image retrieval (ZS-SBIR) is a task of cross-domain image retrieval from a natural image gallery with free-hand sketch under a zero-shot scenario. Previous works mostly focus on a generative approach that takes a highly abstract and sparse sketch as input and then synthesizes the corresponding natural image. However, the intrinsic visual sparsity and large intra-class variance of the sketch make the learning of the conditional decoder more difficult and hence achieve unsatisfactory retrieval performance. In this paper, we propose a novel stacked semantic-guided network to address the unique characteristics of sketches in ZS-SBIR. Specifically, we devise multi-layer feature fusion networks that incorporate different intermediate feature representation information in a deep neural network to alleviate the intrinsic sparsity of sketches. In order to improve visual knowledge transfer from seen to unseen classes, we elaborate a coarse-to-fine conditional decoder that generates coarse-grained category-specific corresponding features first (taking auxiliary semantic information as conditional input) and then generates fine-grained instance-specific corresponding features (taking sketch representation as conditional input). Furthermore, regression loss and classification loss are utilized to preserve the semantic and discriminative information of the synthesized features respectively. Extensive experiments on the large-scale Sketchy dataset and TU-Berlin dataset demonstrate that our proposed approach outperforms state-of-the-art methods by more than 20\% in retrieval performance.

Abstract (translated)

基于零镜头草图的图像检索(zs-sbir)是在零镜头场景下,利用自由手绘的自然图像库进行跨域图像检索的一项任务。以前的作品主要集中在以高度抽象和稀疏的草图为输入的生成方法上,然后合成相应的自然图像。然而,素描的内在视觉稀疏性和较大的类内方差使得条件译码器的学习更加困难,从而导致检索性能不理想。本文提出了一种新颖的叠层语义引导网络,解决了ZS-SBIR中草图的独特性。具体地说,我们设计了多层特征融合网络,将不同的中间特征表示信息融合到一个深度神经网络中,以减轻草图的固有稀疏性。为了提高视觉知识从可见类到未知类的传递,我们设计了一种粗到细的条件译码器,它首先生成粗粒度的特定类别的对应特征(以辅助语义信息为条件输入),然后生成细粒度的特定实例的对应特征(以sketc为例)h表示为条件输入)。此外,利用回归损失和分类损失分别保存了合成特征的语义信息和识别信息。对大型粗略数据集和Tu Berlin数据集进行的大量实验表明,我们提出的方法在检索性能上优于最先进的方法20%以上。

URL

https://arxiv.org/abs/1904.01971

PDF

https://arxiv.org/pdf/1904.01971.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot