Paper Reading AI Learner

Attention: A Big Surprise for Cross-Domain Person Re-Identification

2019-05-30 02:17:07
Haijun Liu, Jian Cheng, Shiguang Wang, Wen Wang

Abstract

In this paper, we focus on model generalization and adaptation for cross-domain person re-identification (Re-ID). Unlike existing cross-domain Re-ID methods, leveraging the auxiliary information of those unlabeled target-domain data, we aim at enhancing the model generalization and adaptation by discriminative feature learning, and directly exploiting a pre-trained model to new domains (datasets) without any utilization of the information from target domains. To address the discriminative feature learning problem, we surprisingly find that simply introducing the attention mechanism to adaptively extract the person features for every domain is of great effectiveness. We adopt two popular type of attention mechanisms, long-range dependency based attention and direct generation based attention. Both of them can perform the attention via spatial or channel dimensions alone, even the combination of spatial and channel dimensions. The outline of different attentions are well illustrated. Moreover, we also incorporate the attention results into the final output of model through skip-connection to improve the features with both high and middle level semantic visual information. In the manner of directly exploiting a pre-trained model to new domains, the attention incorporation method truly could enhance the model generalization and adaptation to perform the cross-domain person Re-ID. We conduct extensive experiments between three large datasets, Market-1501, DukeMTMC-reID and MSMT17. Surprisingly, introducing only attention can achieve state-of-the-art performance, even much better than those cross-domain Re-ID methods utilizing auxiliary information from the target domain.

Abstract (translated)

本文主要研究跨领域人再识别(RE ID)的模型泛化和自适应问题。与现有的跨域REID方法不同,我们利用这些未标记的目标域数据的辅助信息,通过识别特征学习来增强模型的泛化和适应性,并直接将预先训练的模型开发到新的域(数据集),而不必利用来自目标域的信息。为了解决识别性特征学习问题,我们意外地发现,简单地引入注意机制来自适应地提取每个领域的人的特征是非常有效的。我们采用了两种流行的注意机制:基于长期依赖的注意和基于直接生成的注意。它们都可以单独通过空间或通道维度来表现注意力,甚至是空间和通道维度的组合。对不同注意事项的概要进行了很好的说明。此外,我们还通过跳接将关注结果融入到模型的最终输出中,以改善高、中层次语义视觉信息的特征。通过将预先训练的模型直接开发到新领域的方式,注意力融合方法能够真正增强模型的泛化和适应性,实现跨领域人的重新识别,我们在Market-1501、Dukemtmc Reid和MSMT17三大数据集之间进行了广泛的实验。令人惊讶的是,仅仅引入关注可以达到最先进的性能,甚至比那些利用目标域辅助信息的跨域REID方法要好得多。

URL

https://arxiv.org/abs/1905.12830

PDF

https://arxiv.org/pdf/1905.12830.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot