Weakly Supervised Attention Pyramid Convolutional Neural Network for Fine-Grained Visual Classification

2020-02-09 12:33:23

Yifeng Ding, Shaoguo Wen, Jiyang Xie, Dongliang Chang, Zhanyu Ma, Zhongwei Si, Haibin Ling

arXiv_CV

arXiv_CV CNN Weakly_Supervised Classification Attention Pose

Abstract
Abstract (translated)
URL
PDF

Abstract

Classifying the sub-categories of an object from the same super-category (e.g. bird species, car and aircraft models) in fine-grained visual classification (FGVC) highly relies on discriminative feature representation and accurate region localization. Existing approaches mainly focus on distilling information from high-level features. In this paper, however, we show that by integrating low-level information (e.g. color, edge junctions, texture patterns), performance can be improved with enhanced feature representation and accurately located discriminative regions. Our solution, named Attention Pyramid Convolutional Neural Network (AP-CNN), consists of a) a pyramidal hierarchy structure with a top-down feature pathway and a bottom-up attention pathway, and hence learns both high-level semantic and low-level detailed feature representation, and b) an ROI guided refinement strategy with ROI guided dropblock and ROI guided zoom-in, which refines features with discriminative local regions enhanced and background noises eliminated. The proposed AP-CNN can be trained end-to-end, without the need of additional bounding box/part annotations. Extensive experiments on three commonly used FGVC datasets (CUB-200-2011, Stanford Cars, and FGVC-Aircraft) demonstrate that our approach can achieve state-of-the-art performance. Code available at \url{this http URL}

Abstract (translated)

URL

https://arxiv.org/abs/2002.03353

PDF

https://arxiv.org/pdf/2002.03353.pdf