Paper Reading AI Learner

Snap and Find: Deep Discrete Cross-domain Garment Image Retrieval

2019-04-05 06:30:22
Yadan Luo, Ziwei Wang, Zi Huang, Yang Yang, Huimin Lu

Abstract

With the increasing number of online stores, there is a pressing need for intelligent search systems to understand the item photos snapped by customers and search against large-scale product databases to find their desired items. However, it is challenging for conventional retrieval systems to match up the item photos captured by customers and the ones officially released by stores, especially for garment images. To bridge the customer- and store- provided garment photos, existing studies have been widely exploiting the clothing attributes (\textit{e.g.,} black) and landmarks (\textit{e.g.,} collar) to learn a common embedding space for garment representations. Unfortunately they omit the sequential correlation of attributes and consume large quantity of human labors to label the landmarks. In this paper, we propose a deep multi-task cross-domain hashing termed \textit{DMCH}, in which cross-domain embedding and sequential attribute learning are modeled simultaneously. Sequential attribute learning not only provides the semantic guidance for embedding, but also generates rich attention on discriminative local details (\textit{e.g.,} black buttons) of clothing items without requiring extra landmark labels. This leads to promising performance and 306$\times$ boost on efficiency when compared with the state-of-the-art models, which is demonstrated through rigorous experiments on two public fashion datasets.

Abstract (translated)

随着网上商店数量的不断增加,智能搜索系统迫切需要了解客户拍摄的物品照片,并根据大型产品数据库搜索以找到他们想要的物品。然而,传统的检索系统很难将顾客和商店正式发布的商品照片进行匹配,尤其是服装图片。为了架起顾客和商店提供的服装照片的桥梁,现有的研究已经广泛地利用服装属性(例如,黑色)和标志(例如,衣领)来学习服装表示的通用嵌入空间。不幸的是,它们忽略了属性的顺序相关性,并消耗了大量的人力来标记这些地标。本文提出了一种深度多任务跨域散列算法,称为 extit dmch,其中跨域嵌入和顺序属性学习同时建模。顺序属性学习不仅为嵌入提供了语义指导,而且在不需要额外的标志性标签的情况下,也引起了对服装项目的歧视性局部细节(如黑色按钮)的高度关注。这导致了有前途的性能和306美元的效率提高时,相比于国家的最先进的模型,这是通过严格的实验证明了两个公共时尚数据集。

URL

https://arxiv.org/abs/1904.02887

PDF

https://arxiv.org/pdf/1904.02887.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot