Paper Reading AI Learner

Asymmetric Deep Semantic Quantization for Image Retrieval

2019-03-29 13:00:03
Zhan Yang, Osolo Ian Raymond, WuQing Sun, Jun Long

Abstract

Due to its fast retrieval and storage efficiency capabilities, hashing has been widely used in nearest neighbor retrieval tasks. By using deep learning based techniques, hashing can outperform non-learning based hashing in many applications. However, there are some limitations to previous learning based hashing methods (e.g., the learned hash codes are not discriminative due to the hashing methods being unable to discover rich semantic information and the training strategy having difficulty optimizing the discrete binary codes). In this paper, we propose a novel learning based hashing method, named \textbf{\underline{A}}symmetric \textbf{\underline{D}}eep \textbf{\underline{S}}emantic \textbf{\underline{Q}}uantization (\textbf{ADSQ}). \textbf{ADSQ} is implemented using three stream frameworks, which consists of one \emph{LabelNet} and two \emph{ImgNets}. The \emph{LabelNet} leverages three fully-connected layers, which is used to capture rich semantic information between image pairs. For the two \emph{ImgNets}, they each adopt the same convolutional neural network structure, but with different weights (i.e., asymmetric convolutional neural networks). The two \emph{ImgNets} are used to generate discriminative compact hash codes. Specifically, the function of the \emph{LabelNet} is to capture rich semantic information that is used to guide the two \emph{ImgNets} in minimizing the gap between the real-continuous features and discrete binary codes. By doing this, \textbf{ADSQ} can make full use of the most critical semantic information to guide the feature learning process and consider the consistency of the common semantic space and Hamming space. Results from our experiments demonstrate that \textbf{ADSQ} can generate high discriminative compact hash codes and it outperforms current state-of-the-art methods on three benchmark datasets, CIFAR-10, NUS-WIDE, and ImageNet.

Abstract (translated)

哈希算法以其快速的检索和存储效率,在最近邻检索任务中得到了广泛的应用。通过使用基于深度学习的技术,哈希在许多应用程序中可以优于非基于学习的哈希。然而,现有的基于学习的散列方法存在一些局限性(例如,由于散列方法无法发现丰富的语义信息,训练策略难以优化离散二进制代码,因此所学习的散列代码不具有识别性)。在本文中,我们提出了一种新的基于学习的哈希方法,命名为 extbf underline a对称 extbf underline d eep extbf underline s Emantic extbf underline q uanitization( extbf adsq)。 textbf adsq使用三个流框架实现,该框架由一个emph labelnet和两个emph imgnets组成。emph labelnet利用三个完全连接的层,用于捕获图像对之间丰富的语义信息。对于这两个模型,它们都采用相同的卷积神经网络结构,但权重不同(即非对称卷积神经网络)。这两个emph imgnets用于生成具有识别性的压缩散列码。具体来说,labelnet的功能是捕获丰富的语义信息,用于引导两个imgnets最小化实际连续特征和离散二进制代码之间的差距。这样,textbf adsq可以充分利用最关键的语义信息来指导特征学习过程,并考虑公共语义空间和汉明空间的一致性。实验结果表明,在三个基准数据集(cifar-10、nus-wide和imagenet)上,textbf adsq可以产生高识别性的紧凑散列码,其性能优于当前最先进的方法。

URL

https://arxiv.org/abs/1903.12493

PDF

https://arxiv.org/pdf/1903.12493.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot