Pedestrian Attribute Recognition in Video Surveillance Scenarios Based on View-attribute Attention Localization

2021-06-11 16:09:31

Weichen Chen (1) Xinyi Yu (1) Linlin Ou (1) ((1) Collage of Information Engineering, Zhejiang University of Technology, Hangzhou, China)

arXiv_CV

arXiv_CV Recognition Attention Pose Action Surveillance

Abstract
Abstract (translated)
URL
PDF

Abstract

Pedestrian attribute recognition in surveillance scenarios is still a challenging task due to inaccurate localization of specific attributes. In this paper, we propose a novel view-attribute localization method based on attention (VALA), which relies on the strong relevance between attributes and views to capture specific view-attributes and to localize attribute-corresponding areas by attention mechanism. A specific view-attribute is composed by the extracted attribute feature and four view scores which are predicted by view predictor as the confidences for attribute from different views. View-attribute is then delivered back to shallow network layers for supervising deep feature extraction. To explore the location of a view-attribute, regional attention is introduced to aggregate spatial information of the input attribute feature in height and width direction for constraining the image into a narrow range. Moreover, the inter-channel dependency of view-feature is embedded in the above two spatial directions. An attention attribute-specific region is gained after fining the narrow range by balancing the ratio of channel dependencies between height and width branches. The final view-attribute recognition outcome is obtained by combining the output of regional attention with the view scores from view predictor. Experiments on three wide datasets (RAP, RAPv2, PETA, and PA-100K) demonstrate the effectiveness of our approach compared with state-of-the-art methods.

Abstract (translated)

URL

https://arxiv.org/abs/2106.06485

PDF

https://arxiv.org/pdf/2106.06485.pdf