Paper Reading AI Learner

Shared Predictive Cross-Modal Deep Quantization

2019-04-16 06:29:02
Erkun Yang, Cheng Deng, Chao Li, Wei Liu, Jie Li, Dacheng Tao

Abstract

With explosive growth of data volume and ever-increasing diversity of data modalities, cross-modal similarity search, which conducts nearest neighbor search across different modalities, has been attracting increasing interest. This paper presents a deep compact code learning solution for efficient cross-modal similarity search. Many recent studies have proven that quantization-based approaches perform generally better than hashing-based approaches on single-modal similarity search. In this paper, we propose a deep quantization approach, which is among the early attempts of leveraging deep neural networks into quantization-based cross-modal similarity search. Our approach, dubbed shared predictive deep quantization (SPDQ), explicitly formulates a shared subspace across different modalities and two private subspaces for individual modalities, and representations in the shared subspace and the private subspaces are learned simultaneously by embedding them to a reproducing kernel Hilbert space, where the mean embedding of different modality distributions can be explicitly compared. In addition, in the shared subspace, a quantizer is learned to produce the semantics preserving compact codes with the help of label alignment. Thanks to this novel network architecture in cooperation with supervised quantization training, SPDQ can preserve intramodal and intermodal similarities as much as possible and greatly reduce quantization error. Experiments on two popular benchmarks corroborate that our approach outperforms state-of-the-art methods.

Abstract (translated)

随着数据量的爆炸性增长和数据形态多样性的不断增加,跨形态相似性搜索在不同形态下进行最近邻搜索,越来越受到人们的关注。本文提出了一种高效的跨模态相似性搜索的深度压缩码学习方法。许多最近的研究已经证明,基于量化的方法通常比基于散列的单峰相似性搜索方法性能更好。本文提出了一种深度量化方法,它是利用深度神经网络进行量化的跨模态相似性搜索的早期尝试之一。我们的方法被称为共享预测深度量化(SPDQ),它明确地在不同的模态和两个单独模态的私有子空间中形成一个共享子空间,并且通过将共享子空间和私有子空间中的表示嵌入到一个复制的核希尔伯特空间中,同时学习共享子空间和私有子空间中的表示,其中可以显式比较不同模态分布的平均嵌入。另外,在共享子空间中,通过标签对齐,学习量化器生成保留语义的紧凑代码。由于这种新的网络结构与有监督的量化训练相结合,SPDQ能够尽可能地保持模态内和模态间的相似性,大大降低量化误差。对两个流行基准的实验证实,我们的方法优于最先进的方法。

URL

https://arxiv.org/abs/1904.07488

PDF

https://arxiv.org/pdf/1904.07488.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot