Paper Reading AI Learner

Omni-Scale Feature Learning for Person Re-Identification

2019-05-02 20:42:26
Kaiyang Zhou, Yongxin Yang, Andrea Cavallaro, Tao Xiang

Abstract

As an instance-level recognition problem, person re-identification (ReID) relies on discriminative features, which not only capture different spatial scales but also encapsulate an arbitrary combination of multiple scales. We call these features of both homogeneous and heterogeneous scales omni-scale features. In this paper, a novel deep CNN is designed, termed Omni-Scale Network (OSNet), for omni-scale feature learning in ReID. This is achieved by designing a residual block composed of multiple convolutional feature streams, each detecting features at a certain scale. Importantly, a novel unified aggregation gate is introduced to dynamically fuse multi-scale features with input-dependent channel-wise weights. To efficiently learn spatial-channel correlations and avoid overfitting, the building block uses both pointwise and depthwise convolutions. By stacking such blocks layer-by-layer, our OSNet is extremely lightweight and can be trained from scratch on existing ReID benchmarks. Despite its small model size, our OSNet achieves state-of-the-art performance on six person-ReID datasets.

Abstract (translated)

作为一个实例级的识别问题,人的再识别(REID)依赖于识别特征,它不仅捕获了不同的空间尺度,而且还封装了多尺度的任意组合。我们称这些特征为同质和异质尺度的全尺度特征。本文设计了一种全新的深度CNN,称为OSNET,用于REID中的全尺度特征学习。这是通过设计一个由多个卷积特征流组成的剩余块来实现的,每个卷积特征流在一定的尺度上检测特征。重要的是,引入了一种新的统一聚合门来动态融合具有输入相关信道权重的多尺度特征。为了有效地学习空间信道相关性并避免过度拟合,构建块同时使用点向和非方向卷积。通过一层一层地堆叠这些块,我们的OSnet非常轻,可以在现有REID基准上从头开始训练。尽管型号较小,我们的OSNET在六个人的REID数据集上仍取得了最先进的性能。

URL

https://arxiv.org/abs/1905.00953

PDF

https://arxiv.org/pdf/1905.00953.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot